Plot Gaussian mixture in R with ggplot2


Mr. Zen

I'm approximating a distribution with a mixture of Gaussians, and was wondering if there was an easy way to automatically plot the estimated kernel density of the entire (1D) dataset as the sum of the component densities, similar to this way using ggplot2 :

Full data density plotted using individual component densities

Given the following sample data, my approach in ggplot2 would be to manually plot the subset densities into the scaled population densities as follows:

#example data
a<-rnorm(1000,0,1) #component 1
b<-rnorm(1000,5,2) #component 2
d<-c(a,b) #overall data 
df<-data.frame(d,id=rep(c(1,2),each=1000)) #add group id

##ggplot2
require(ggplot2)

ggplot(df) +
  geom_density(aes(x=d,y=..scaled..)) +
  geom_density(data=subset(df,id==1), aes(x=d), lty=2) +
  geom_density(data=subset(df,id==2), aes(x=d), lty=4)

ggplot2 figure

Note that this does not work for scale. It also doesn't work when you scale all 3 densities or none at all. Therefore, I cannot reproduce the above plot.

In addition, I am not able to automatically generate this plot without having to subset manually. I tried using position = "stacked" as parameter in geom_density.

I usually have around 5-6 Components per dataset, so manually subsetting would be possible. However, I would like to have different colors or line-types per component density which are displayed in the legend of ggplot, so doing all subsets manually would increase the workload quite a bit.

Any ideas? Thanks!

missuse

Here is a possible solution by specifying each density in the aes call with position = "identity" in one layer and in the second layer using stacked density without the legend.

ggplot(df) +
  stat_density(aes(x = d,  linetype = as.factor(id)), position = "stack", geom = "line", show.legend = F, color = "red") +
  stat_density(aes(x = d,  linetype = as.factor(id)), position = "identity", geom = "line")

enter image description here

Do note that when using more then two groups:

  a <- rnorm(1000, 0, 1) 
  b <- rnorm(1000, 5, 2) 
  c <- rnorm(1000, 3, 2)
  d <- rnorm(1000, -2, 1)
  d <- c(a, b, c, d)
  df <- data.frame(d, id = as.factor(rep(c(1, 2, 3, 4), each = 1000))) 

Curves for each stack appear (this is a problem with both sets of examples, but it linetype's masked in the first layer - groupcheck instead):

 gplot(df) +
    stat_density(aes(x = d, group = id), position = "stack", geom = "line", show.legend = F, color = "red") +
    stat_density(aes(x = d, linetype = id), position = "identity", geom = "line")

enter image description here

A relatively simple workaround is to add an alpha map and then manually set the unwanted curves to 0:

  ggplot(df) +
    stat_density(aes(x=d, alpha = id), position = "stack", geom = "line", show.legend = F, color = "red") +
    stat_density(aes(x=d,  linetype = id), position = "identity", geom = "line")+
    scale_alpha_manual(values = c(1,0,0,0))

enter image description here

Related


Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians, and was wondering if there was an easy way to automatically plot the estimated kernel density of the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians, and was wondering if there was an easy way to automatically plot the estimated kernel density of the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians, and was wondering if there was an easy way to automatically plot the estimated kernel density of the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians and was wondering if there was an easy way to automatically plot the estimated kernel density for the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians and was wondering if there was an easy way to automatically plot the estimated kernel density for the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians and was wondering if there was an easy way to automatically plot the estimated kernel density for the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Plot Gaussian mixture in R with ggplot2

Mr. Zen I'm approximating a distribution with a mixture of Gaussians and was wondering if there was an easy way to automatically plot the estimated kernel density for the entire (1D) dataset as the sum of the component densities, similar to this way using ggpl

Holes in a Gaussian Mixture Plot

Alex Gaspare I am trying to plot a Gaussian mixture model using Matlab. I am using the following code/data: p = [0.048544095760874664 , 0.23086205172287944 , 0.43286598287228106 ,0.1825503345829704 , 0.10517753506099443]; meanVectors(:,1) = [1.356437538131880

Convert plot in Matlab from Gaussian mixture to uniform

username Consider the following vector plotted 2x1in Matlab whose probability distribution is a mixture of two Gaussian components. P=10^3; %number draws v=1; %First component mu_a = [0,0.5]; sigma_a = [v,0;0,v]; %Second component mu_b = [0,8.2]; sigma_b = [

Convert plot in Matlab from Gaussian mixture to uniform

username Consider the following vector plotted 2x1in Matlab whose probability distribution is a mixture of two Gaussian components. P=10^3; %number draws v=1; %First component mu_a = [0,0.5]; sigma_a = [v,0;0,v]; %Second component mu_b = [0,8.2]; sigma_b = [

Convert plot in Matlab from Gaussian mixture to uniform

username Consider the following vector plotted 2x1in Matlab whose probability distribution is a mixture of two Gaussian components. P=10^3; %number draws v=1; %First component mu_a = [0,0.5]; sigma_a = [v,0;0,v]; %Second component mu_b = [0,8.2]; sigma_b = [

Convert plot in Matlab from Gaussian mixture to uniform

username Consider the following vector plotted 2x1in Matlab whose probability distribution is a mixture of two Gaussian components. P=10^3; %number draws v=1; %First component mu_a = [0,0.5]; sigma_a = [v,0;0,v]; %Second component mu_b = [0,8.2]; sigma_b = [

Convert plot in Matlab from Gaussian mixture to uniform

username Consider the following vector plotted 2x1in Matlab whose probability distribution is a mixture of two Gaussian components. P=10^3; %number draws v=1; %First component mu_a = [0,0.5]; sigma_a = [v,0;0,v]; %Second component mu_b = [0,8.2]; sigma_b = [

Equivalent of Matlab "fit" of Gaussian mixture model in R?

Dentist_Not edible I have some time series data that looks like this: x <- c(0.5833, 0.95041, 1.722, 3.1928, 3.941, 5.1202, 6.2125, 5.8828, 4.3406, 5.1353, 3.8468, 4.233, 5.8468, 6.1872, 6.1245, 7.6262, 8.6887, 7.7549, 6.9805, 4.3217, 3.0347, 2.4026, 1.9317,

Implementing Gaussian Mixture MLE using optim() in R

User 2007598 I am trying to implement MLE for mixture of Gaussians in R using optim() using R's native dataset (Geyser from MASS). My code is as follows. The problem is that optim works fine, but returns the original parameters I passed to it, and also says it

Implementing Gaussian Mixture MLE using optim() in R

User 2007598 I am trying to implement MLE for mixture of Gaussians in R using optim() using R's native dataset (Geyser from MASS). My code is as follows. The problem is that optim works fine, but returns the original parameters I passed to it, and also says it

Implementing Gaussian Mixture MLE using optim() in R

User 2007598 I am trying to implement MLE for mixture of Gaussians in R using optim() using R's native dataset (Geyser from MASS). My code is as follows. The problem is that optim works fine, but returns the original parameters I passed to it, and also says it

Gaussian mixture modeling with mle2/optim

CodeGuy I have mle2developed a mockup here to demonstrate the problem. x1I generate a sum of values from two separate Gaussian distributions x2, combine them together in form x=c(x1,x2), and then create an MLE that attempts to reclassify xthe values as belongi

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp

Gaussian Mixture Modeling Matlab

Ashwin Shank Im using a Gaussian mixture model to estimate the log-likelihood function (parameters are estimated by the EM algorithm) Im using Matlab ... My data size is: 17991402*1...17991402 1D data points: When I run gmdistribution.fit(X, 2) I get the desir

Gaussian Mixture Modeling Matlab

Ashwin Shank Im using a Gaussian mixture model to estimate the log-likelihood function (parameters are estimated by the EM algorithm) Im using Matlab ... My data size is: 17991402*1...17991402 1D data points: When I run gmdistribution.fit(X, 2) I get the desir

Understanding Gaussian Mixture Models

Hansner I'm trying to understand the results of the scikit-learn Gaussian Mixture Model implementation. See the example below: #!/opt/local/bin/python import numpy as np import matplotlib.pyplot as plt from sklearn.mixture import GaussianMixture # Define simp