Linear regression

Linear regression with single output

Suppose we have n sets of trainning data, the i\rm{th} set of data consists m inputs {x_1^i,...,x_m^i}, and one output y^i. To predict the result with new data, linear regression is used.

Hypothesis

By the linear regression hypothesis, the function h_\theta( {\bf{x}}^i) is represented as

h_\theta({\bf {x}}^i) = \theta _0 +\theta_1 x_1^i + ... + \theta x_m^i

where {\bf {x}}^i = [1,x_1^i,...,x_m^i]^T

The vectorized formation is

{h_\theta({\bf X})} = {\bf X} { \boldsymbol \theta}

where

{h_\theta({\bf X})} = [ h_\theta({\bf x}^1), ... , h_\theta({\bf x}^n)]^T,

{\boldsymbol \theta} = [\theta_0, ..., \theta_m]^T,

{\bf X} =\left [ {\bf x}^1, ..., {\bf x}^n \right ]^T

The value of \boldsymbol \theta is determined by the cost function.

Cost function

The least mean square is choosen as the cost function J(\boldsymbol \theta)

J(\boldsymbol \theta) = \frac{1}{2n}\sum_{i=1}^{n} (h_\theta ({\bf x}^i) - y^i)^2 = \frac{1}{2n}({h_\theta ({\bf X})} - {\bf y})^T({h_\theta ({\bf X})} - {\bf y} )

where {\bf {y}} = [1,y_1,...,y_n]^T

Find \boldsymbol \theta

To minimize the cost function J({\boldsymbol \theta}), the derivative of J({\boldsymbol \theta}) should equals to 0.

\begin{matrix} &\frac{\partial J(\theta)}{\theta_0} = \frac{1}{n}\sum_i^{n}({h_\theta ({\bf x}^i)} - y^i ) = 0\\ &\frac{\partial J(\theta)}{\theta_1} = \frac{1}{n}\sum_i^{n}({h_\theta ({\bf x}^i)} - y^i )x_1^i = 0\\ &... \\ &\frac{\partial J(\theta)}{\theta_m} = \frac{1}{n}\sum_i^{n}({h_\theta ({\bf x}^i)} - y^i )x_m^i =0\\ \end{matrix}

The vectorized form can be written as

\frac{1}{n}( {\bf X}^T {\bf X} {\boldsymbol \theta} - {\bf X}^T {\bf y} ) = 0

The value of \boldsymbol \theta is

{\boldsymbol \theta} = ({\bf X}^T {\bf X})^{-1} {\bf X}^T {\bf y }

Linear regression with multiple output

Suppose we have n sets of trainning date, the i\rm{th} set of data consists m inputs {x_1^i,...,x_m^i}, and k output y_1^i,...,y_k^i.

The linear regression for multiple outputs is similiar with linear regression for sinlgr output.

For the Hypothesis function h^j_\theta({\bf x}^i), j =1,...,k is

h^j_\theta({\bf x}^i) = \theta^j _0 +\theta^j_1 x_1^i + ... + \theta^j x_m^i

each hypothesis h^j_\theta({\bf x}^i) is a sinple output hypothesis and linearly indepent.

The vectorized form can be written as

{H_\Theta({\bf X})} = {\bf X} \boldsymbol \Theta

where H_\Theta({\bf X}) = [h_\theta^1({\bf X}),...,h_\theta^k({\bf X})], \boldsymbol \Theta = [\boldsymbol \theta^1,...\boldsymbol \theta^k]

The cost function for multiple output linear regression can be written as multiple linealy independent sinple output linear regresson, with same operation on dericative. Finally the \boldsymbol \Theta can be calculated with

\boldsymbol \Theta = ({\bf X}^T {\bf X})^{-1} {\bf X}^T {\bf Y}

Where {\bf Y} = [{\bf y}^1,...,{\bf y}^k].

Python example of linear regression with multiple outputs

import math

import numpy as np
import sklearn
from sklearn import metrics
from sklearn import linear_model


# Load Training Date and test data
f = np.loadtxt('MOF_Training4.dat',unpack='true')
f2 = np.loadtxt('MOF_test.dat',unpack='true')

# X: training input data    y: training output data
# X2: test inut data 
# (It is weird that although I put 0:3 and 3:6, it is actually actually 0:2, 3:5 columns.
#    Not sure if there is somthing wrong on my laptop) 
X = np.transpose(f[3:6,:])
y = np.transpose(f[0:3,:])
XX = np.ones((len(X), len(X[0,:])+1))
XX[:,1:4] = X


# randomly pick some data for test. Here, 10 sets of data are choosen
n_test = 10
L = np.zeros((n_test,6))
for i in range(0,n_test):
  L[i,:] = np.transpose(f2[:,np.random.randint(low = 1,high = len(f2[0]))])
  X2 = L[:,3:6]
  Y3 = L[:,0:3]

##### Calculate with the linear regression equations #####
# initialize X with an extra row of element [1]
XX2 = np.ones((len(X2), len(X2[0,:])+1))
XX2[:,1:4] = X2

# Calculate theta by linear regression equation
XX2T = np.transpose(XX)
pinvXX = np.linalg.pinv(np.matmul(XX2T,XX))
theta = np. matmul(np.matmul( pinvXX, XX2T ), y ) 

# Predict with theta
y2 = np.matmul(XX2,theta) 


#### Calculate with sklearn libraty ####
# Train data with slinear regression
lm = linear_model.LinearRegression()

# Linear fit
lm.fit(X,y)

# predict the results with test data input
y3 = lm.predict(X2)


## Outputs 
print(' Input X Centroid, Y centroid, Area')
print(X2)
print(' ')

# print norm X norm, Y norm, Intersecpt predicted by Linear Regression
print(' predicted norm X norm, Y norm, Intersecpt(py)')
print(y2) 
print(' ')

# print norm X norm, Y norm, Intersecpt predicted by Linear Regression
print(' predicted norm X norm, Y norm, Intersecpt(sklearn)')
print(y3) 
print(' ')

# print exact norm X norm, Y norm, Intersecpt
print(' exact norm X norm, Y norm, Intersecpt')
print(Y3)
print(' ')

# print(' CPU_time for fit')
# print(timer2-timer1)

# print(' CPU_time for predict')
# print(timer3-timer2)
 Input X Centroid, Y centroid, Area
[[ 0.011728  0.461987 -0.43144 ]
 [ 0.722013  0.09947   0.094082]
 [ 0.5       0.248299 -0.023811]
 [ 0.47532   0.216514 -0.126761]
 [ 0.630344  0.114905 -0.134785]
 [ 0.974811  0.011673  0.009922]
 [ 0.560012  0.219963 -0.003007]
 [ 0.860352  0.063292  0.035395]
 [ 0.964991  0.014939 -0.01494 ]
 [ 0.762323  0.088118  0.080133]]
 
 predicted norm X norm, Y norm, Intersecpt(py)
[[ 9.34969586e-01 -1.06476457e+00  5.23932912e-01]
 [ 2.01307450e-01  2.32187975e-01 -2.38219364e-01]
 [ 5.02507675e-01 -5.87639280e-02  5.73360462e-06]
 [ 4.38181172e-01 -3.12837524e-01  2.64879459e-02]
 [ 2.32544812e-01 -3.32640210e-01 -1.39856401e-01]
 [ 2.36238249e-02  2.44868209e-02 -5.09477480e-01]
 [ 4.45161260e-01 -7.42107141e-03 -6.43885340e-02]
 [ 1.28090390e-01  8.73524518e-02 -3.86660318e-01]
 [ 3.02335578e-02 -3.68709035e-02 -4.98940392e-01]
 [ 1.78333265e-01  1.97762792e-01 -2.81472928e-01]]
 
 predicted norm X norm, Y norm, Intersecpt(sklearn)
[[ 9.34969586e-01 -1.06476457e+00  5.23932912e-01]
 [ 2.01307450e-01  2.32187975e-01 -2.38219364e-01]
 [ 5.02507675e-01 -5.87639280e-02  5.73360462e-06]
 [ 4.38181172e-01 -3.12837524e-01  2.64879459e-02]
 [ 2.32544812e-01 -3.32640210e-01 -1.39856401e-01]
 [ 2.36238249e-02  2.44868209e-02 -5.09477480e-01]
 [ 4.45161260e-01 -7.42107141e-03 -6.43885340e-02]
 [ 1.28090390e-01  8.73524518e-02 -3.86660318e-01]
 [ 3.02335578e-02 -3.68709035e-02 -4.98940392e-01]
 [ 1.78333265e-01  1.97762792e-01 -2.81472928e-01]]
 
 exact norm X norm, Y norm, Intersecpt
[[ 0.874571 -0.484898  0.58    ]
 [ 0.726715  0.686939 -0.18    ]
 [ 0.989948 -0.141429  0.      ]
 [ 0.81037  -0.585918  0.02    ]
 [ 0.664204 -0.747551 -0.1     ]
 [ 0.923384  0.383878 -0.52    ]
 [ 0.999796 -0.020204 -0.06    ]
 [ 0.931526  0.363673 -0.34    ]
 [ 0.707071 -0.707143 -0.52    ]
 [ 0.745295  0.666735 -0.22    ]]
 
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s