Understanding Gaussian Mixture Models


Hansner

I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below:

#!/opt/local/bin/python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture

# Define simple gaussian
def gauss_function(x, amp, x0, sigma):
    return amp * np.exp(-(x - x0) ** 2. / (2. * sigma ** 2.))

# Generate sample from three gaussian distributions
samples = np.random.normal(-0.5, 0.2, 2000)
samples = np.append(samples, np.random.normal(-0.1, 0.07, 5000))
samples = np.append(samples, np.random.normal(0.2, 0.13, 10000))

# Fit GMM
gmm = GaussianMixture(n_components=3, covariance_type="full", tol=0.001)
gmm = gmm.fit(X=np.expand_dims(samples, 1))

# Evaluate GMM
gmm_x = np.linspace(-2, 1.5, 5000)
gmm_y = np.exp(gmm.score_samples(gmm_x.reshape(-1, 1)))

# Construct function manually as sum of gaussians
gmm_y_sum = np.full_like(gmm_x, fill_value=0, dtype=np.float32)
for m, c, w in zip(gmm.means_.ravel(), gmm.covariances_.ravel(), 
               gmm.weights_.ravel()):
    gmm_y_sum += gauss_function(x=gmm_x, amp=w, x0=m, sigma=np.sqrt(c))

# Normalize so that integral is 1    
gmm_y_sum /= np.trapz(gmm_y_sum, gmm_x)

# Make regular histogram
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[8, 5])
ax.hist(samples, bins=50, normed=True, alpha=0.5, color="#0070FF")
ax.plot(gmm_x, gmm_y, color="crimson", lw=4, label="GMM")
ax.plot(gmm_x, gmm_y_sum, color="black", lw=4, label="Gauss_sum")

# Annotate diagram
ax.set_ylabel("Probability density")
ax.set_xlabel("Arbitrary units")

# Draw legend
plt.legend()
plt.show()

Result graph of the above code

Here I first generate a sample distribution consisting of Gaussians and then fit a Gaussian mixture model to these data. Next, I want to calculate the probability of some given input. Conveniently, the scikit implementation provides a way to score_samplesdo this . Now, I am trying to understand these results. I always thought that I could take the Gaussian parameters from the GMM fit and construct the same distribution by summing and then normalizing the integral to 1. However, as you can see in the plot, score_samplesthis method fits perfectly (red line) with the original data (blue histogram), whereas the manually constructed distribution (black line) does not. I'm trying to understand what's wrong with my idea and why I can't construct the distribution myself by summarizing the Gaussians given by the GMM fit! Thanks so much for your input!

Hansner

Just in case someone in the future wants to know the same thing: the individual components must be normalized, not the sum:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture

# Define simple gaussian
def gauss_function(x, amp, x0, sigma):
    return amp * np.exp(-(x - x0) ** 2. / (2. * sigma ** 2.))

# Generate sample from three gaussian distributions
samples = np.random.normal(-0.5, 0.2, 2000)
samples = np.append(samples, np.random.normal(-0.1, 0.07, 5000))
samples = np.append(samples, np.random.normal(0.2, 0.13, 10000))

# Fit GMM
gmm = GaussianMixture(n_components=3, covariance_type="full", tol=0.001)
gmm = gmm.fit(X=np.expand_dims(samples, 1))

# Evaluate GMM
gmm_x = np.linspace(-2, 1.5, 5000)
gmm_y = np.exp(gmm.score_samples(gmm_x.reshape(-1, 1)))

# Construct function manually as sum of gaussians
gmm_y_sum = np.full_like(gmm_x, fill_value=0, dtype=np.float32)
for m, c, w in zip(gmm.means_.ravel(), gmm.covariances_.ravel(), gmm.weights_.ravel()):
    gauss = gauss_function(x=gmm_x, amp=1, x0=m, sigma=np.sqrt(c))
    gmm_y_sum += gauss / np.trapz(gauss, gmm_x) * w

# Make regular histogram
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[8, 5])
ax.hist(samples, bins=50, normed=True, alpha=0.5, color="#0070FF")
ax.plot(gmm_x, gmm_y, color="crimson", lw=4, label="GMM")
ax.plot(gmm_x, gmm_y_sum, color="black", lw=4, label="Gauss_sum", linestyle="dashed")

# Annotate diagram
ax.set_ylabel("Probability density")
ax.set_xlabel("Arbitrary units")

# Make legend
plt.legend()

plt.show()

enter image description here

Related


Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Gaussian Mixture Models for Pixel Clustering

Huckleberry Finn I have a small aerial image where human experts have marked the different terrain visible in the image. For example, an image can contain vegetation, rivers, rocky mountains, farmland, etc. Each image can have one or more of these marked areas

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Gaussian Mixture Models for Pixel Clustering

Huckleberry Finn I have a small aerial image where human experts have marked the different terrain visible in the image. For example, an image can contain vegetation, rivers, rocky mountains, farmland, etc. Each image can have one or more of these marked areas

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

MATLAB code for many Gaussian mixture models

Ali Bodaghi I have applied the gaussmix function in the Voicebox MATLAB tool to calculate the GMM. However, when I run it for 512 GMM components, the code gives me errors. No_of_Clusters = 512; No_of_Iterations = 10; [m_ubm1,v_ubm1,w_ubm1]=gaussmix(feature,[],

Motion is recorded using only Gaussian mixture models

Genevieve I am using this example on a Gaussian mixture model . I have a video showing a car in motion, but it's on a less busy street. A few cars flew by every now and then, but for the most part, there was no movement in the background. It gets very tedious

Gaussian Mixture Modeling Matlab

Ashwin Shank Im using a Gaussian mixture model to estimate the log-likelihood function (parameters are estimated by the EM algorithm) Im using Matlab ... My data size is: 17991402*1...17991402 1D data points: When I run gmdistribution.fit(X, 2) I get the desir