Can I fix the mean of one component of a Gaussian mixture model in python before fitting?


Benjamin Doughty

I am interested in fitting a 2-component Gaussian mixture model to the data shown below. However, since I'm plotting log-transformed counts here, normalized to be between 0-1, the maximum value my data will take is 0. When I try to do a naive fit using sklearn.mixture.GaussianMixture (code below), I get results that fit, which is obviously not what I want.Log-transformed count rate data, cannot exceed 0

from sklearn.mixture import GaussianMixture
import numpy as np

# start with some count data in (0,1]
logged_counts = np.log(counts)
model = GaussianMixture(2).fit(logged_counts.reshape(-1,1))

# plot resulting fit
x_range = np.linspace(np.min(logged_counts), 0, 1000)
pdf = np.exp(model.score_samples(x_range.reshape(-1, 1)))
responsibilities = model.predict_proba(x_range.reshape(-1, 1))
pdf_individual = responsibilities * pdf[:, np.newaxis]

plt.hist(logged_counts, bins='auto', density=True, histtype='stepfilled', alpha=0.5)
plt.plot(x_range, pdf, '-k', label='Mixture')
plt.plot(x_range, pdf_individual, '--k', label='Components')
plt.legend()
plt.show()

Fitting using sklearn's two-component GMMI would love it if I could fix the mean of the top component at 0 and only optimize the other mean, the two variances, and the mixing fractions. (Additionally I would love to be able to use a half-normal for the component on the right.) Is there a simple way to do this with built-in functions in python/sklearn, or will I have to build that model myself using some probabilistic programming language?

Drey

Afaik, you cannot do exactly what you want in sklearn.

Imho, basically there are multiple strategies: (i) implement GMM yourself, (ii) switch to another language/framework, (iii) adapt GMM code, or (iv) adapt.


(i) You probably do not want to do this unless you want to learn for yourself.


(ii) You could use stan and adapt the code in the last paragraph to have a fixed component of your choice (distribution type and parameters)


(iii) You could do (i) but slightly adapt the sklearn code or simply use the methods for estimation but with you own slight modifications.


(iv)

  • Gaussian Mixture model will not work here (as you mentioned) because you require a truncated Normal distribution for the "first" (fixed) component.
  • If you would not require to fit for the variance of the fixed component then you can always just substract your fixed component from the data. (i.e. for each point subtract the point's quantile-value from the point value)
  • If you don't mind the accuracy of the estimate, you can do this by two things: First identify the two components using the GMM. Then, only look at the data in the component you want to fix. Fit a truncated Gaussian model (using .fit(data)). The resulting parameter (as in option 2) is then subtracted from the original data. Then install a GMM. Find the next component.

Hope this helps :-)

Related


Fitting Gaussian mixture with fixed covariance in Python

Ulf Aslak: I have some 2D data (GPS data) with clusters (stop locations) that I know are similar to Gaussians with characteristic standard deviations (proportional to the inherent noise of GPS samples). The image below shows a sample, I would like it to have t

Sampling data points from Gaussian mixture model python

Yufeng I am really new to python and GMM. I recently learned GMM and tried to implement the code from here I have some problems running the gmm.sample() method: gmm16 = GaussianMixture(n_components=16, covariance_type='full', random_state=0) Xnew = gmm16.s

Sampling data points from Gaussian mixture model python

Yufeng I am really new to python and GMM. I recently learned GMM and tried to implement the code from here I have some problems running the gmm.sample() method: gmm16 = GaussianMixture(n_components=16, covariance_type='full', random_state=0) Xnew = gmm16.s

Semi-Supervised Gaussian Mixture Model Clustering in Python

Avpenn I have images that I want to subdivide using a Gaussian mixture model scikit-learn. Some images have labels, so I want to use a lot of prior information. I would like to do semi-supervised training of a hybrid model by providing some cluster assignments

Semi-Supervised Gaussian Mixture Model Clustering in Python

Avpenn I have images that I want to subdivide using a Gaussian mixture model scikit-learn. Some images have labels, so I want to use a lot of prior information. I would like to do semi-supervised training of a hybrid model by providing some cluster assignments

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model (GMM) is not suitable

Book I've been using Scikit-learn's GMM function. First, I created a distribution along the line x=y. from sklearn import mixture import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D line_model = mixture.GMM(n_components

Gaussian Mixture Model (GMM) is not suitable

BenB I've been using Scikit-learn's GMM function. First, I created a distribution along the line x=y. from sklearn import mixture import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D line_model = mixture.GMM(n_components

Number of parameters in a Gaussian mixture model

golden_truth I have D-dimensional data with K components. How many parameters do I need if I use a model with a full covariance matrix? and if I use the diagonal covariance matrix how many? golden_truth xyLe_ 's answer in CrossValidated https://stats.stackexch

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model for Image Histogram

Dotted glass I am trying to do automatic image segmentation of different regions of a 2D MR image based on pixel intensity values. The first step is to implement a Gaussian mixture model on the histogram of the image. I need to plot the resulting Gaussian obta

Gaussian Mixture Model Cross Validation

Newkid I want to perform cross validation on my Gaussian mixture model. Currently, my cross_validationapproach using sklearn is as follows. clf = GaussianMixture(n_components=len(np.unique(y)), covariance_type='full') cv_ortho = cross_validate(clf, parameters_

Gaussian Mixture Model (GMM) is not suitable

BenB I've been using Scikit-learn's GMM function. First, I created a distribution along the line x=y. from sklearn import mixture import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D line_model = mixture.GMM(n_components