Understand the concept of Gaussian mixture models

StuckInPhD

I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison.

This is what I understand, please let me know if my concept is wrong:

GMM is like KNN in the sense that clustering can be achieved in both cases. But in GMM, each cluster has its own independent mean and covariance. Furthermore, k-means makes a hard assignment of data points to clusters, whereas in GMM we obtain a collection of independent Gaussian distributions, and for each data point we have the potential to belong to one of these distributions.

To understand it better, I use MatLab to code it and achieve the desired clustering. I have used SIFT function for feature extraction. And have used k-means clustering to initialize the values. (from VLFeat documentation)

%images is a 459 x 1 cell array where each cell contains the training image
[locations, all_feats] = vl_dsift(single(images{1}), 'fast', 'step', 50); %all_feats will be 128 x no. of keypoints detected
for i=2:(size(images,1))
    [locations, feats] = vl_dsift(single(images{i}), 'fast', 'step', 50);
    all_feats = cat(2, all_feats, feats); %cat column wise all features
end

numClusters = 50; %Just a random selection.
% Run KMeans to pre-cluster the data
[initMeans, assignments] = vl_kmeans(single(all_feats), numClusters, ...
    'Algorithm','Lloyd', ...
    'MaxNumIterations',5);

initMeans = double(initMeans); %GMM needs it to be double

% Find the initial means, covariances and priors
for i=1:numClusters
    data_k = all_feats(:,assignments==i);
    initPriors(i) = size(data_k,2) / numClusters;

    if size(data_k,1) == 0 || size(data_k,2) == 0
        initCovariances(:,i) = diag(cov(data'));
    else
        initCovariances(:,i) = double(diag(cov(double((data_k')))));
    end
end

% Run EM starting from the given parameters
[means,covariances,priors,ll,posteriors] = vl_gmm(double(all_feats), numClusters, ...
    'initialization','custom', ...
    'InitMeans',initMeans, ...
    'InitCovariances',initCovariances, ...
    'InitPriors',initPriors);

Based on the above I have means, covariancesand priors. My main question is, now what? I'm a little lost now.

Also means, covariancesvectors are all sizes 128 x 50. I expected them 1 x 50because each column is a cluster, so wouldn't each cluster have only one mean and covariance? (I know 128 is a SIFT function, but I expect mean and covariance).

In k-means, I used the MatLab command which knnsearch(X,Y)basically finds the closest neighbors in X for each point in X.

So how to achieve this in GMM, I know it's a collection of probabilities, and of course the closest match to that probability will be our winning cluster. This is where I get confused. All online tutorials teach how to implement means, covariancesvalue, but don't say too much, how to actually use them in terms of clustering.

thanks

Amro

I think it would help if you first looked at what the GMM model represents . I would use the functions from the Statistics Toolbox , but you should be able to do the same with VLFeat.

Let's start with a mixture of two 1D normal distributions . Each Gaussian is represented by a pair of mean and variance . The mixture assigns weights to each component (previously).

For example, let's mix two normal distributions with equal weights ( p = [0.5; 0.5]) , the first is centered at 0 and the second is centered at 5 ( ), and the variances of the mu = [0; 5]first and second distributions ( sigma = cat(3, 1, 2)) are each equal to 1 and 2 .

As you can see below, the mean effectively changes the distribution, while the variance determines how wide/narrow and flat/sharp the distribution is. The prior sets the mixing ratio to obtain the final combined model.

% create GMM
mu = [0; 5];
sigma = cat(3, 1, 2);
p = [0.5; 0.5];
gmm = gmdistribution(mu, sigma, p);

% view PDF
ezplot(@(x) pdf(gmm,x));

2 Mixtures of 1D Gaussians

The idea of EM clustering is that each distribution represents a cluster. So in the above example with one-dimensional data, if an instance is given x = 0.5, we assign it a 99.5% probability of belonging to the first cluster/pattern

>> x = 0.5;
>> posterior(gmm, x)
ans =
    0.9950    0.0050    % probability x came from each component

You can see how the instance looks under the first bell curve. Whereas if you take a point in the middle, the answer will be more ambiguous (points assigned to class=2, but with much more uncertainty):

>> x = 2.2
>> posterior(gmm, 2.2)
ans =
    0.4717    0.5283

The same concept extends to higher dimensions with multivariate normal distributions . In more than one dimension, the covariance matrix is a generalization of the variance in order to account for interdependencies among features.

Here is another example where two MVN distributions are mixed in two dimensions:

% first distribution is centered at (0,0), second at (-1,3)
mu = [0 0; 3 3];

% covariance of first is identity matrix, second diagonal
sigma = cat(3, eye(2), [5 0; 0 1]);

% again I'm using equal priors
p = [0.5; 0.5];

% build GMM
gmm = gmdistribution(mu, sigma, p);

% 2D projection
ezcontourf(@(x,y) pdf(gmm,[x y]));

% view PDF surface
ezsurfc(@(x,y) pdf(gmm,[x y]));

2D Gaussian Mixture 2

There is some intuition behind how the the covariance matrix affects the shape of the joint density function. For instance in 2D, if the matrix is diagonal it implies that the two dimensions don't co-vary. In that case the PDF would look like an axis-aligned ellipse stretched out either horizontally or vertically according to which dimension has the bigger variance. If they are equal, then the shape is a perfect circle (distribution spread out in both dimensions at an equal rate). Finally if the covariance matrix is arbitrary (non-diagonal but still symmetric by definition), then it will probably look like a stretched ellipse rotated at some angle.

So in the previous figure, you should be able to tell the two "bumps" apart and what individual distribution each represent. When you go 3D and higher dimensions, think of the it as representing (hyper-)ellipsoids in N-dims.

2d covariance matrix

Now when you're performing clustering using GMM, the goal is to find the model parameters (mean and covariance of each distribution as well as the priors) so that the resulting model best fits the data. The best-fit estimation translates into maximizing the likelihood of the data given the GMM model (meaning you choose model that maximizes Pr(data|model)).

As other have explained, this is solved iteratively using the EM algorithm; EM starts with an initial estimate or guess of the parameters of the mixture model. It iteratively re-scores the data instances against the mixture density produced by the parameters. The re-scored instances are then used to update the parameter estimates. This is repeated until the algorithm converges.

Unfortunately the EM algorithm is very sensitive to the initialization of the model, so it might take a long time to converge if you set poor initial values, or even get stuck in local optima. A better way to initial the GMM parameters is to use K-means as a first step (like you've shown in your code), and using the mean/cov of those clusters to initialize EM.

As with other cluster analysis techniques, we first need to decide on the number of clusters to use. Cross-validation is a robust way to find a good estimate of the number of clusters.

EM clustering suffers from the fact that there a lot parameters to fit, and usually requires lots of data and many iterations to get good results. An unconstrained model with M-mixtures and D-dimensional data involves fitting D*D*M + D*M + M parameters (M covariance matrices each of size DxD, plus M mean vectors of length D, plus a vector of priors of length M). That could be a problem for datasets with large number of dimensions. So it is customary to impose restrictions and assumption to simplify the problem (a sort of regularization to avoid overfitting problems). For instance you could fix the covariance matrix to be only diagonal or even have the covariance matrices shared across all Gaussians.

Finally, once the mixture model has been fitted, clustering can be explored by computing the posterior probability of the data instance using each mixture component (as I showed in the 1D example). GMM assigns each instance to a cluster based on this "membership" possibility.

Here is a more complete example of data clustering using a Gaussian mixture model:

% load Fisher Iris dataset
load fisheriris

% project it down to 2 dimensions for the sake of visualization
[~,data] = pca(meas,'NumComponents',2);
mn = min(data); mx = max(data);
D = size(data,2);    % data dimension    

% inital kmeans step used to initialize EM
K = 3;               % number of mixtures/clusters
cInd = kmeans(data, K, 'EmptyAction','singleton');

% fit a GMM model
gmm = fitgmdist(data, K, 'Options',statset('MaxIter',1000), ...
    'CovType','full', 'SharedCov',false, 'Regularize',0.01, 'Start',cInd);

% means, covariances, and mixing-weights
mu = gmm.mu;
sigma = gmm.Sigma;
p = gmm.PComponents;

% cluster and posterior probablity of each instance
% note that: [~,clustIdx] = max(p,[],2)
[clustInd,~,p] = cluster(gmm, data);
tabulate(clustInd)

% plot data, clustering of the entire domain, and the GMM contours
clrLite = [1 0.6 0.6 ; 0.6 1 0.6 ; 0.6 0.6 1];
clrDark = [0.7 0 0 ; 0 0.7 0 ; 0 0 0.7];
[X,Y] = meshgrid(linspace(mn(1),mx(1),50), linspace(mn(2),mx(2),50));
C = cluster(gmm, [X(:) Y(:)]);
image(X(:), Y(:), reshape(C,size(X))), hold on
gscatter(data(:,1), data(:,2), species, clrDark)
h = ezcontour(@(x,y)pdf(gmm,[x y]), [mn(1) mx(1) mn(2) mx(2)]);
set(h, 'LineColor','k', 'LineStyle',':')
hold off, axis xy, colormap(clrLite)
title('2D data and fitted GMM'), xlabel('PC1'), ylabel('PC2')

EM cluster

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

tom I am using scikit-learn to fit a multivariate Gaussian mixture model to some data (it works great). But I need to be able to get a new GMM conditioned on some variables , and the scikit toolkit doesn't seem to be able to do that, which surprises me as it s

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model for Image Histogram

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Gaussian Mixture Models in MATLAB - Calculate Empirical Variance Covariance Matrix

tex I'm having trouble reconciling some basic theoretical results of a mixture of Gaussians and the output of a command in gmdistribution, randomMatlab . Consider a mixture of two independent 3-variable normal distributions with a weight of 1/2,1/2. The first

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Is it important to do feature scaling before using Gaussian mixture models?

Chamay Ahmed Is it important to do feature scaling before using Gaussian mixture models? and why it matters when we use probability to get cluster parameters (mean and covariance matrices). On the other hand, I know it is important to normalize our data before

Understanding Gaussian Mixture Models

Gaussian Mixture Models for Pixel Clustering

Huckleberry Finn I have a small aerial image where human experts have marked the different terrain visible in the image. For example, an image can contain vegetation, rivers, rocky mountains, farmland, etc. Each image can have one or more of these marked areas

Estimating mixture of Gaussian models in Pytorch

Gaussian Mixture Modeling Matlab

Ashwin Shank Im using a Gaussian mixture model to estimate the log-likelihood function (parameters are estimated by the EM algorithm) Im using Matlab ... My data size is: 17991402*1...17991402 1D data points: When I run gmdistribution.fit(X, 2) I get the desir

Gaussian Mixture Modeling Matlab

Holes in a Gaussian Mixture Plot

Alex Gaspare I am trying to plot a Gaussian mixture model using Matlab. I am using the following code/data: p = [0.048544095760874664 , 0.23086205172287944 , 0.43286598287228106 ,0.1825503345829704 , 0.10517753506099443]; meanVectors(:,1) = [1.356437538131880

MATLAB code for many Gaussian mixture models

Ali Bodaghi I have applied the gaussmix function in the Voicebox MATLAB tool to calculate the GMM. However, when I run it for 512 GMM components, the code gives me errors. No_of_Clusters = 512; No_of_Iterations = 10; [m_ubm1,v_ubm1,w_ubm1]=gaussmix(feature,[],

Understand the concept of Gaussian mixture models

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Estimating mixture of Gaussian models in Pytorch

Made progress on MATLAB fitgmdist for Gaussian mixture models, but still getting errors :(

Daniel Lopez Hello Stack Overflow family. I've been trying to figure out how to use this pernicious fitgmdist on MATLAB to fit a Gaussian mixture model. I've made progress, but I'm still getting an error when trying to set the initial parameters. I get the fol

Understanding Gaussian Mixture Models

Gaussian Mixture Models for Pixel Clustering

Estimating mixture of Gaussian models in Pytorch

Motion is recorded using only Gaussian mixture models

Genevieve I am using this example on a Gaussian mixture model . I have a video showing a car in motion, but it's on a less busy street. A few cars flew by every now and then, but for the most part, there was no movement in the background. It gets very tedious

Understand the concept of Gaussian mixture models

Related

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Understanding Gaussian Mixture Models

Gaussian Mixture Model for Image Histogram

Gaussian Mixture Model for Image Histogram

Estimating mixture of Gaussian models in Pytorch

Understand the concept of Gaussian mixture models

Gaussian Mixture Models in MATLAB - Calculate Empirical Variance Covariance Matrix

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Is it important to do feature scaling before using Gaussian mixture models?

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Gaussian Mixture Models for Pixel Clustering

Estimating mixture of Gaussian models in Pytorch

Gaussian Mixture Modeling Matlab

Gaussian Mixture Modeling Matlab

Holes in a Gaussian Mixture Plot

MATLAB code for many Gaussian mixture models

Understand the concept of Gaussian mixture models

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Estimating mixture of Gaussian models in Pytorch

Made progress on MATLAB fitgmdist for Gaussian mixture models, but still getting errors :(

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Gaussian Mixture Models for Pixel Clustering

Estimating mixture of Gaussian models in Pytorch

Estimating mixture of Gaussian models in Pytorch

Estimating mixture of Gaussian models in Pytorch

Motion is recorded using only Gaussian mixture models

Ranking