PYMC3 mixture models: helpful for understanding multivariate models


James Stirling

Suppose I have a dataframe with 4 variables. I want to see if I can generate a posterior for a gamma mixture over all variables, with the goal of finding clusters for each observation. I'm guessing I'm going to need some kind of multivariate gamma distribution? But what should I do?

Here is some pymc3 code example with one parameter, looking for a mixture of two gammas (I chose arbitrary parameters):

with pm.Model() as m:
     p = pm.Dirichlet('p', a = np.ones(2))

     alpha = pm.Gamma('means',alpha = 1, beta = 1, shape = 2)
     beta = pm.Gamma('means',alpha = 1, beta = 1, shape = 2)

     x = pm.Gammma('x', alpha, beta)

     comp_dist = pm.Gamma.dist(means, scale, shape = (2,))
     like = pm.Mixture('y', w = p,comp_dists = comp_dist, observed = data)

     trace = pm.sample(1000)

So my question is, how can I extend this basic example to multiple variables? I'm assuming I need to define the relationship between the variables to be encoded in the model in some way? I feel like I understand the basics of hybrid modeling, but at the same time feel like I'm missing some basics.

Mel

This is how the multidimensional case works:

J = 4 # num dimensions
K = 2 # num clusters

with pm.Model() as m:
    p = pm.Dirichlet('p', a=np.ones(K))

    alpha = pm.Gamma('alpha', alpha=1, beta=1, shape=(J,K))
    beta  = pm.Gamma('beta',  alpha=1, beta=1, shape=(J,K))
    gamma = pm.Gamma.dist(alpha=alpha, beta=beta, shape=(J,K))

    like = pm.Mixture('y', w=p, comp_dists=gamma, observed=X, shape=J)

    trace = pm.sample(1000)

X.shapewhere should be (N,J).


A note on symmetry breaking

The hard part will be solving the identifiability problem , but I think that's beyond the scope of the question. Maybe take a look at the GMM tutorial how to use the pm.Potentialfunction to break symmetry . I would expect alphaa highly correlated likelihood function parameterization like and betawould exacerbate this problem, so perhaps consider switching to muand sigmaparameterization.

Related


Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Incorporating uncertainty into pymc3 models

Brad I have a set of data with mean, standard deviation and number of observations for each point (i.e. I have knowledge about the accuracy of the measurements). In a traditional pymc3 model looking only at the mean, I might do something like: x = data['mean']

Categorical Mixed Models in Pymc3

Chris I'm new to Pymc3 and trying to create a categorical mixture model as shown in https://en.wikipedia.org/wiki/Mixture_model#Categorical_mixture_model . I'm having trouble concatenating the 'x' variable. I think it's because I have to make the z variable de

Incorporating uncertainty into pymc3 models

Brad I have a set of data with mean, standard deviation and number of observations for each point (i.e. I have knowledge about the accuracy of the measurements). In a traditional pymc3 model looking only at the mean, I might do something like: x = data['mean']

Incorporating uncertainty into pymc3 models

Brad I have a set of data with mean, standard deviation and number of observations for each point (i.e. I have knowledge about the accuracy of the measurements). In a traditional pymc3 model looking only at the mean, I might do something like: x = data['mean']

Categorical Mixed Models in Pymc3

Chris I'm new to Pymc3 and trying to create a categorical mixture model as shown in https://en.wikipedia.org/wiki/Mixture_model#Categorical_mixture_model . I'm having trouble concatenating the 'x' variable. I think it's because I have to make the z variable de

Incorporating uncertainty into pymc3 models

Brad I have a set of data with mean, standard deviation and number of observations for each point (i.e. I have knowledge about the accuracy of the measurements). In a traditional pymc3 model looking only at the mean, I might do something like: x = data['mean']

Incorporating uncertainty into pymc3 models

Brad I have a set of data with mean, standard deviation and number of observations for each point (i.e. I have knowledge about the accuracy of the measurements). In a traditional pymc3 model looking only at the mean, I might do something like: x = data['mean']

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Gaussian Mixture Models for Pixel Clustering

Huckleberry Finn I have a small aerial image where human experts have marked the different terrain visible in the image. For example, an image can contain vegetation, rivers, rocky mountains, farmland, etc. Each image can have one or more of these marked areas

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Understand the concept of Gaussian mixture models

StuckInPhD I am trying to understand GMM by reading online resources. I have implemented clustering using K-Means and am looking at GMM vs K-means comparison. This is what I understand, please let me know if my concept is wrong: GMM is like KNN in the sense th

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Gaussian Mixture Models for Pixel Clustering

Huckleberry Finn I have a small aerial image where human experts have marked the different terrain visible in the image. For example, an image can contain vegetation, rivers, rocky mountains, farmland, etc. Each image can have one or more of these marked areas

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Estimating mixture of Gaussian models in Pytorch

www3 I actually want to estimate the normalized flow with a Gaussian mixture as the base distribution, so I'm a bit torch bound. However, you can reproduce my error in code just by estimating the mixture of Gaussian models in torch. My code is as follows: impo

Creating Multivariate AR Models in MATLAB

ajl123 I would like to create two vector time series in MATLAB or Python as shown below. Variances = 1and 0.7. X(t) = 0.9X(t − 1) − 0.5X(t − 2) + ε(t) Y(t) = 0.8Y (t − 1) − 0.5Y (t − 2) + 0.16X(t − 1) − 0.2X(t − 2) + η(t) How can I do this...I know X(t) and c

Creating Multivariate AR Models in MATLAB

ajl123 I would like to create two vector time series in MATLAB or Python as shown below. Variances = 1and 0.7. X(t) = 0.9X(t − 1) − 0.5X(t − 2) + ε(t) Y(t) = 0.8Y (t − 1) − 0.5Y (t − 2) + 0.16X(t − 1) − 0.2X(t − 2) + η(t) How can I do this...I know X(t) and c