Generative Models for Classification

Published:

Linear Discriminant Analysis

Now performing LDA on the Smarket data

lda = LDA(store_covariance=True)

X_train , X_test =[M.drop(columns=['intercept'],errors='ignore') 
                   for M in [X_train,X_test]]
lda.fit(X_train,L_train)
X_train
lda.means_
  • Prior values $f_{k}(x)$ for the two classes
  • Down,Up are the classes that the LDA wanna classify
lda.priors_ 
lda.classes_
lda.scalings_
lda_pred =lda.predict(X_test)
lda_pred
confusion_table(lda_pred, L_test)
lda_probs = lda.predict_proba(X_test)
np.all(np.where(lda_probs[:,1]>=0.5,'Up','Down')==lda_pred)
np.all([lda.classes_[i] for i in np.argmax(lda_probs,1)]== lda_pred)
  • each row on the lda_probs represents sample and each column represents a class
  • This makes us able to change treshold to suit better problems
np.sum(lda_probs[:,0]>0.52)

Quadratic Discriminant Analysis

qda = QDA(store_covariance=True)
qda.fit(X_train,L_train)
qda.means_, qda.priors_
qda.covariance_[0]
qda.covariance_[1]
qda_pred = qda.predict(X_test)
confusion_table(qda_pred, L_test)
  • The predict method calculate the discriminant fucntion $\delta_{k}$ for each class and assign it to the class $k$ with the largest discriminant function
  • QDA predicts $151$ results correctly and $101$ missclassified
  • QDA performed better than the LDA with $60\%$ accuracy
(151)/252
np.mean(qda_pred == L_test)
  • The QDA beats the LDA and Logistic Regression
  • But it’s need more evaluation on bigger test data set

Naive Bayes

NB = GaussianNB()
NB.fit(X_train, L_train)
NB.classes_
NB.class_prior_
NB.theta_
NB.var_
NB?
X_train[L_train=='Down'].mean()
X_train[L_train=='Up'].mean()
X_train[L_train=='Down'].var(ddof=0)
X_train[L_train=='Up'].var(ddof=0)
nb_labels = NB.predict(X_test)
confusion_table(nb_labels,L_test)
  • The Gaussian Naive Bayse results in $150$ correct prediction
  • Which is around $59\%$
  • Better than the LDA and slightly worse than QDA
NB.predict_proba(X_test)[:5]

Categories: