Unstable accuracy of Gaussian mixture model classifier from sklearn


aspartic acid

I have some data from two different speakers (MFCC feature for speaker recognition). Each person has 60 vectors of 13 features (120 total). Each of them has its own label (0 and 1). I need to display the results on a confusion matrix. But GaussianMixturesklearn's model is not stable. For each program run, I receive a different score (sometimes accuracy is 0.4, sometimes 0.7...). I don't know what I'm doing wrong because similarly I created SVM and k-NN models and they work fine (stable accuracy around 0.9). Do you know what I'm doing wrong?

gmmclf = GaussianMixture(n_components=2, covariance_type='diag')
gmmclf.fit(X_train, y_train) #X_train are mfcc vectors, y_train are labels

ygmm_pred_class = gmmclf.predict(X_test)
print(accuracy_score(y_test, ygmm_pred_class))
print(confusion_matrix(y_test, ygmm_pred_class))
desert boat

Short answer: you should simply not use GMM for classification.


long answer...

From an answer to a related thread, Multiclass classification using Gaussian Mixture Models with scikit learn (emphasis added):

Gaussian mixture is not a classifier. This is a density estimation method, and it's not a good idea to expect that its components will magically align with your class. [...] GMM just tries to fit a mixture of Gaussians to your data, but nothing forces it to place them according to the labels (not even provided in the fit call). Sometimes this works - but only for small problems , where the classes are so well separated that even Naive Bayes will work, but in general it's just an ineffective tool for the problem.

And the comments of the interviewee himself (again emphasising the emphasis in the original):

As stated in the answer - GMM is not a classifier, so it is impossible to answer if you are using "GMM classifier" correctly. Using a GMM as a classifier is by definition incorrect, and there is no "efficient" way to use it in a problem like this, as it is not what this model was designed for. What you can do is build a suitable generative model for each class. In other words, build your own classifier where each label fits a GMM , then use the assigned probabilities to do the actual classification. Then it is a suitable classifier. See github.com/scikit-learn/scikit-learn/pull/2468

(For what it's worth, you might want to note that the interviewee was a research scientist at DeepMind and was the first to earn a machine-learning gold badge at SO )

To elaborate further (that's why I didn't simply mark the question as a duplicate):

Indeed, there is a post called GMM classification in the scikit-learn documentation :

Demonstration of Gaussian mixture models for classification.

I guess this didn't exist in 2017, when the above reply was written. However, dig into the provided code and you'll see that the GMM model is actually used in the way proposed by lejlot above; there are no in-form declarations classifier.fit(X_train, y_train)- all usage is in-form classifier.fit(X_train), ie no actual labels are used.

This is exactly what we would expect from a class clustering algorithm (which is indeed a GMM), not a classifier. scikit-learn again provides an option to provide labels in the GMM fitmethod :

fit (self, x, y = none)

What you've actually used here (again, probably didn't exist in 2017, as the responses above suggest), however, given what we know about GMM and its usage, it's not clear what this parameter is for (and, Let me say that scikit-learn has a place in some practices that seem sensible from a pure programming perspective , but pointless from a modeling perspective ).

Final remark: While fixing the random seed (as suggested in the comments) seems to "work", it's probably not a good idea to trust a "classifier" that provides an accuracy range between 0.4 and 0.7 based on the random seed . ..

Related


Get PDF from Gaussian Mixture Model in sklearn

learner I have fitted a Gaussian Mixture Model (GMM) to the data series I have. Using GMM, I am trying to get the probability of another vector, element-wise. Matlab achieves this with the following lines of code. a = reshape(0:1:15, 14, 1); gm = fitgmdist(a,

Problems with sklearn.mixture.GMM (Gaussian Mixture Model)

Gabriele Pompa I'm fairly new to scikit-lear and GMM in general...I have some questions about the fit quality of Gaussian mixture models in python (scikit-learn). I have an array of data that you can find in DATA HERE to match a GMM of n=2 components . As a be

Sampling data points from Gaussian mixture model python

Yufeng I am really new to python and GMM. I recently learned GMM and tried to implement the code from here I have some problems running the gmm.sample() method: gmm16 = GaussianMixture(n_components=16, covariance_type='full', random_state=0) Xnew = gmm16.s

Sampling data points from Gaussian mixture model python

Yufeng I am really new to python and GMM. I recently learned GMM and tried to implement the code from here I have some problems running the gmm.sample() method: gmm16 = GaussianMixture(n_components=16, covariance_type='full', random_state=0) Xnew = gmm16.s

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model (GMM) is not suitable

Book I've been using Scikit-learn's GMM function. First, I created a distribution along the line x=y. from sklearn import mixture import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D line_model = mixture.GMM(n_components

Gaussian Mixture Model (GMM) is not suitable

BenB I've been using Scikit-learn's GMM function. First, I created a distribution along the line x=y. from sklearn import mixture import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D line_model = mixture.GMM(n_components

Number of parameters in a Gaussian mixture model

golden_truth I have D-dimensional data with K components. How many parameters do I need if I use a model with a full covariance matrix? and if I use the diagonal covariance matrix how many? golden_truth xyLe_ 's answer in CrossValidated https://stats.stackexch

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model (GMM) is not suitable

BenB I've been using Scikit-learn's GMM function. First, I created a distribution along the line x=y. from sklearn import mixture import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D line_model = mixture.GMM(n_components

PyMC3 Gaussian Mixture Model

Anjum Sayed I've been following PyMC3's Gaussian Mixture Model example here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb , and it works perfectly with artificial datasets . I've tried using real datasets, but I'm

Clustering images using a Gaussian mixture model

ninja I want to cluster a binary image using GMM (Gaussian Mixture Model) and also want to plot the cluster centroids on the binary image itself. I used this as a reference : http://in.mathworks.com/help/stats/gaussian-mixture-models.html Here is my initial co

Clustering images using a Gaussian mixture model

ninja I want to cluster a binary image using GMM (Gaussian Mixture Model) and also want to plot the cluster centroids on the binary image itself. I used this as a reference : http://in.mathworks.com/help/stats/gaussian-mixture-models.html Here is my initial co

Equivalent of Matlab "fit" of Gaussian mixture model in R?

Dentist_Not edible I have some time series data that looks like this: x <- c(0.5833, 0.95041, 1.722, 3.1928, 3.941, 5.1202, 6.2125, 5.8828, 4.3406, 5.1353, 3.8468, 4.233, 5.8468, 6.1872, 6.1245, 7.6262, 8.6887, 7.7549, 6.9805, 4.3217, 3.0347, 2.4026, 1.9317,

How to evaluate samples in a weighted Gaussian mixture model?

kind Lite: If I have a MoG model with n components, each component has its own weight w^n. I have a sample. I wish to calculate the probability of drawing samples from the MoG. I can easily evaluate individual Gaussians, but I don't know how to consider their

PyMC3 Gaussian Mixture Model

Anjum Sayed I've been following the Gaussian Mixture Model example for PyMC3 here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb , and it works perfectly with the artificial dataset . I've tried using real datasets,

PyMC3 Gaussian Mixture Model

Anjum Sayed I've been following the Gaussian Mixture Model example for PyMC3 here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb , and got it working perfectly with the artificial dataset . I've tried using real dat

PyMC3 Gaussian Mixture Model

Anjum Sayed I've been following the Gaussian Mixture Model example for PyMC3 here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb , and got it working perfectly with the artificial dataset . I've tried using real dat

PyMC3 Gaussian Mixture Model

Anjum Sayed I've been following the Gaussian Mixture Model example for PyMC3 here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb , and got it working perfectly with the artificial dataset . I've tried using real dat

PyMC3 Gaussian Mixture Model

Anjum Sayed I've been following the Gaussian Mixture Model example for PyMC3 here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb , and got it working perfectly with the artificial dataset . I've tried using real dat