Multiple Linear Regression

Published: June 05, 2024

Muiltiple Regression

new_X= MS(['lstat','age']).fit_transform(Boston)
model1= sm.OLS(Y,new_X)
results1= model1.fit()
summarize(results1)

variables = Boston.columns.drop('medv')
variables

new_X= MS(variables).fit_transform(Boston) # Design matrix
new_X

model = sm.OLS(Y,new_X)
results =model.fit()
summarize(results)

variables_no_age = Boston.columns.drop(['medv','age'])
Xma= MS(variables_no_age).fit_transform(Boston)
model1= sm.OLS(Y,Xma)
results2 = model1.fit()

summarize(results2)

Multivariate Goodness of Fit

print(new_X.shape)

dir(results)
results.rsquared
new_X.shape[1]
X.shape[1]

vals = [VIF(new_X,i)
        for i in range (1,new_X.shape[1])]
variance_inflation_factor = pd.DataFrame({'variance_inflation_factor':vals},index=new_X.columns[1:])
variance_inflation_factor

vals =[]
for i in range (1,new_X.shape[1]): # same as List Comprehension
    vals.append(VIF(new_X,i))

variance_inflation_factor = pd.DataFrame({'variance_inflation_factor':vals},index=new_X.columns[1:])
variance_inflation_factor

Interaction Terms

Including interaction terms in a linear model helps to removes additive assumption in Linear Regression
Combnining Predictors will increase the predictions

X_inter = MS(['lstat','age',('lstat','age')]).fit_transform(Boston)
X_inter

model3= sm.OLS(Y,X_inter)
results3= model3.fit()

summarize(results3)

Non-Linear Transformations of the Predictors

Extending the linear regression model using Polynomial Regression

X_poly = MS([poly('lstat',degree=2),'age']).fit_transform(Boston)
X_poly

model4 = sm.OLS(Y,X_poly)
results4= model4.fit()

summarize(results4)

The zero p-value suggests that it leas to an improved model model

Using anova_lm() to quantify and compare the difference between the two models results

anova_lm(results4,results2)

anova_lm() performs a hypothesis test comparing the two models
The Null hypothesis -> that the the quadratic term in the results4 model isnt needed
The alternative hypothesis -> That the quadratic term is superior than the simple linear model
The NaN means that there is no previous model to compare to
anova_lm() can compare multiple models

ax= subplots(figsize=(8,8))[1]
ax.scatter(results4.fittedvalues,results4.resid)
ax.set_xlabel('Fitted values')
ax.set_ylabel('Residuals')
ax.axhline(0,c='k',ls='--');

Notice that using the Polynomial Regression Reduced The ther Residuals which increase the model accuracy

Qualitative PRedictors

Carseats = load_data('Carseats')
Carseats.columns

all_variables = list (Carseats.columns.drop('Sales'))
Y= Carseats['Sales']
model_variables = all_variables+[('Income','Advertising'),('Price','Age')]

X = MS(model_variables).fit_transform(Carseats)
X

The MS Automatically transforms qualitative features using one-hot encoding
Each category gets a vector where one element is set to 1 and all others 0
Example: Red(1,0,0) Blue(0,1,0) Green(0,0,1)

model = sm.OLS(Y,X)
results = model.fit()

summarize(results)

The high coefficients for ShelveLoc[Good] indicates that’s a good shelving effects the sales

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

WOOJIN PARK

Muiltiple Regression

Multivariate Goodness of Fit

Interaction Terms

Non-Linear Transformations of the Predictors

Qualitative PRedictors

Share on