Generative Models for Classification
Published:
Linear Discriminant Analysis
Now performing LDA on the Smarket data
lda = LDA(store_covariance=True)
X_train , X_test =[M.drop(columns=['intercept'],errors='ignore')
for M in [X_train,X_test]]
lda.fit(X_train,L_train)
X_train
lda.means_
- Prior values $f_{k}(x)$ for the two classes
Down,Upare the classes that the LDA wanna classify
lda.priors_
lda.classes_
lda.scalings_
lda_pred =lda.predict(X_test)
lda_pred
confusion_table(lda_pred, L_test)
lda_probs = lda.predict_proba(X_test)
np.all(np.where(lda_probs[:,1]>=0.5,'Up','Down')==lda_pred)
np.all([lda.classes_[i] for i in np.argmax(lda_probs,1)]== lda_pred)
- each row on the
lda_probsrepresents sample and each column represents a class - This makes us able to change treshold to suit better problems
np.sum(lda_probs[:,0]>0.52)
Quadratic Discriminant Analysis
qda = QDA(store_covariance=True)
qda.fit(X_train,L_train)
qda.means_, qda.priors_
qda.covariance_[0]
qda.covariance_[1]
qda_pred = qda.predict(X_test)
confusion_table(qda_pred, L_test)
- The
predictmethod calculate the discriminant fucntion $\delta_{k}$ for each class and assign it to the class $k$ with the largest discriminant function - QDA predicts $151$ results correctly and $101$ missclassified
- QDA performed better than the LDA with $60\%$ accuracy
(151)/252
np.mean(qda_pred == L_test)
- The QDA beats the LDA and Logistic Regression
- But it’s need more evaluation on bigger test data set
Naive Bayes
NB = GaussianNB()
NB.fit(X_train, L_train)
NB.classes_
NB.class_prior_
NB.theta_
NB.var_
NB?
X_train[L_train=='Down'].mean()
X_train[L_train=='Up'].mean()
X_train[L_train=='Down'].var(ddof=0)
X_train[L_train=='Up'].var(ddof=0)
nb_labels = NB.predict(X_test)
confusion_table(nb_labels,L_test)
- The Gaussian Naive Bayse results in $150$ correct prediction
- Which is around $59\%$
- Better than the LDA and slightly worse than QDA
NB.predict_proba(X_test)[:5]
