Understanding Gaussian Mixture Models

Hansner

I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below:

#!/opt/local/bin/python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture

# Define simple gaussian
def gauss_function(x, amp, x0, sigma):
    return amp * np.exp(-(x - x0) ** 2. / (2. * sigma ** 2.))

# Generate sample from three gaussian distributions
samples = np.random.normal(-0.5, 0.2, 2000)
samples = np.append(samples, np.random.normal(-0.1, 0.07, 5000))
samples = np.append(samples, np.random.normal(0.2, 0.13, 10000))

# Fit GMM
gmm = GaussianMixture(n_components=3, covariance_type="full", tol=0.001)
gmm = gmm.fit(X=np.expand_dims(samples, 1))

# Evaluate GMM
gmm_x = np.linspace(-2, 1.5, 5000)
gmm_y = np.exp(gmm.score_samples(gmm_x.reshape(-1, 1)))

# Construct function manually as sum of gaussians
gmm_y_sum = np.full_like(gmm_x, fill_value=0, dtype=np.float32)
for m, c, w in zip(gmm.means_.ravel(), gmm.covariances_.ravel(), 
               gmm.weights_.ravel()):
    gmm_y_sum += gauss_function(x=gmm_x, amp=w, x0=m, sigma=np.sqrt(c))

# Normalize so that integral is 1    
gmm_y_sum /= np.trapz(gmm_y_sum, gmm_x)

# Make regular histogram
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[8, 5])
ax.hist(samples, bins=50, normed=True, alpha=0.5, color="#0070FF")
ax.plot(gmm_x, gmm_y, color="crimson", lw=4, label="GMM")
ax.plot(gmm_x, gmm_y_sum, color="black", lw=4, label="Gauss_sum")

# Annotate diagram
ax.set_ylabel("Probability density")
ax.set_xlabel("Arbitrary units")

# Draw legend
plt.legend()
plt.show()

Here I first generate a sample distribution consisting of Gaussians and then fit a Gaussian mixture model to these data. Next, I want to calculate the probability of some given input. Conveniently, the scikit implementation provides a way to score_samplesdo this . Now, I am trying to understand these results. I always thought that I could take the Gaussian parameters from the GMM fit and construct the same distribution by summing and then normalizing the integral to 1. However, as you can see in the plot, score_samplesthis method fits perfectly (red line) with the original data (blue histogram), whereas the manually constructed distribution (black line) does not. I'm trying to understand what's wrong with my idea and why I can't construct the distribution myself by summarizing the Gaussians given by the GMM fit! Thanks so much for your input!

Hansner

Just in case someone in the future wants to know the same thing: the individual components must be normalized, not the sum:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture

# Define simple gaussian
def gauss_function(x, amp, x0, sigma):
    return amp * np.exp(-(x - x0) ** 2. / (2. * sigma ** 2.))

# Generate sample from three gaussian distributions
samples = np.random.normal(-0.5, 0.2, 2000)
samples = np.append(samples, np.random.normal(-0.1, 0.07, 5000))
samples = np.append(samples, np.random.normal(0.2, 0.13, 10000))

# Fit GMM
gmm = GaussianMixture(n_components=3, covariance_type="full", tol=0.001)
gmm = gmm.fit(X=np.expand_dims(samples, 1))

# Evaluate GMM
gmm_x = np.linspace(-2, 1.5, 5000)
gmm_y = np.exp(gmm.score_samples(gmm_x.reshape(-1, 1)))

# Construct function manually as sum of gaussians
gmm_y_sum = np.full_like(gmm_x, fill_value=0, dtype=np.float32)
for m, c, w in zip(gmm.means_.ravel(), gmm.covariances_.ravel(), gmm.weights_.ravel()):
    gauss = gauss_function(x=gmm_x, amp=1, x0=m, sigma=np.sqrt(c))
    gmm_y_sum += gauss / np.trapz(gauss, gmm_x) * w

# Make regular histogram
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[8, 5])
ax.hist(samples, bins=50, normed=True, alpha=0.5, color="#0070FF")
ax.plot(gmm_x, gmm_y, color="crimson", lw=4, label="GMM")
ax.plot(gmm_x, gmm_y_sum, color="black", lw=4, label="Gauss_sum", linestyle="dashed")

# Annotate diagram
ax.set_ylabel("Probability density")
ax.set_xlabel("Arbitrary units")

# Make legend
plt.legend()

plt.show()

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Understand the concept of Gaussian mixture models

Gaussian Mixture Models for Pixel Clustering

Huckleberry Finn I have a small aerial image where human experts have marked the different terrain visible in the image. For example, an image can contain vegetation, rivers, rocky mountains, farmland, etc. Each image can have one or more of these marked areas

Estimating mixture of Gaussian models in Pytorch

Understand the concept of Gaussian mixture models

Estimating mixture of Gaussian models in Pytorch

Gaussian Mixture Models for Pixel Clustering

Estimating mixture of Gaussian models in Pytorch

MATLAB code for many Gaussian mixture models

Ali Bodaghi I have applied the gaussmix function in the Voicebox MATLAB tool to calculate the GMM. However, when I run it for 512 GMM components, the code gives me errors. No_of_Clusters = 512; No_of_Iterations = 10; [m_ubm1,v_ubm1,w_ubm1]=gaussmix(feature,[],

Motion is recorded using only Gaussian mixture models

Genevieve I am using this example on a Gaussian mixture model . I have a video showing a car in motion, but it's on a less busy street. A few cars flew by every now and then, but for the most part, there was no movement in the background. It gets very tedious

Gaussian Mixture Models in MATLAB - Calculate Empirical Variance Covariance Matrix

tex I'm having trouble reconciling some basic theoretical results of a mixture of Gaussians and the output of a command in gmdistribution, randomMatlab . Consider a mixture of two independent 3-variable normal distributions with a weight of 1/2,1/2. The first

Is it important to do feature scaling before using Gaussian mixture models?

Chamay Ahmed Is it important to do feature scaling before using Gaussian mixture models? and why it matters when we use probability to get cluster parameters (mean and covariance matrices). On the other hand, I know it is important to normalize our data before

Made progress on MATLAB fitgmdist for Gaussian mixture models, but still getting errors :(

Daniel Lopez Hello Stack Overflow family. I've been trying to figure out how to use this pernicious fitgmdist on MATLAB to fit a Gaussian mixture model. I've made progress, but I'm still getting an error when trying to set the initial parameters. I get the fol

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

tom I am using scikit-learn to fit a multivariate Gaussian mixture model to some data (it works great). But I need to be able to get a new GMM conditioned on some variables , and the scikit toolkit doesn't seem to be able to do that, which surprises me as it s

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

PYMC3 mixture models: helpful for understanding multivariate models

James Stirling Suppose I have a dataframe with 4 variables. I want to see if I can generate a posterior for a gamma mixture over all variables, with the goal of finding clusters for each observation. I'm guessing I'm going to need some kind of multivariate gam

PYMC3 mixture models: helpful for understanding multivariate models

Gaussian Mixture Modeling Matlab

Ashwin Shank Im using a Gaussian mixture model to estimate the log-likelihood function (parameters are estimated by the EM algorithm) Im using Matlab ... My data size is: 17991402*1...17991402 1D data points: When I run gmdistribution.fit(X, 2) I get the desir

Understanding Gaussian Mixture Models

Related

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Understanding Gaussian Mixture Models

Understand the concept of Gaussian mixture models

Estimating mixture of Gaussian models in Pytorch

Understand the concept of Gaussian mixture models

Gaussian Mixture Models for Pixel Clustering

Estimating mixture of Gaussian models in Pytorch

Understand the concept of Gaussian mixture models

Estimating mixture of Gaussian models in Pytorch

Gaussian Mixture Models for Pixel Clustering

Estimating mixture of Gaussian models in Pytorch

Estimating mixture of Gaussian models in Pytorch

Estimating mixture of Gaussian models in Pytorch

MATLAB code for many Gaussian mixture models

Motion is recorded using only Gaussian mixture models

Gaussian Mixture Models in MATLAB - Calculate Empirical Variance Covariance Matrix

Is it important to do feature scaling before using Gaussian mixture models?

Made progress on MATLAB fitgmdist for Gaussian mixture models, but still getting errors :(

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

Finding Conditional Gaussian Mixture Models using scikit-learn.mixture.GMM

PYMC3 mixture models: helpful for understanding multivariate models

PYMC3 mixture models: helpful for understanding multivariate models

PYMC3 mixture models: helpful for understanding multivariate models

PYMC3 mixture models: helpful for understanding multivariate models

Gaussian Mixture Modeling Matlab

Ranking