How can I compute a principal component analysis using my dataset to draw a graph showing PC1 vs PC2?


jaria20

Please can someone suggest me how to do principal component analysis with the gene dataset I have.

I have a table with 15 columns. The first is the disease group, where 0 is control, 1 is ulcerative colitis, and 2 is Crohn's disease.

The remaining 14 columns correspond to 14 different genes.

I would like to plot PC1 vs PC2 (via prcomp) after PCA to show if any clustering or separation occurs between the three groups based on the gene expression data (each axis shows the proportion of variation). However, I'm having a hard time knowing where to start because I can't convert column 1 to row names with row.names = 1 because R doesn't allow duplicate row names.

Converting to a matrix and trying the following code doesn't work.

mockdata1 <- as.matrix(mockdata)
rownames(mockdata1) <- mockdata1[,1]
mockdata1[,1] <- NULL
or 
mockdata2 <-mockdata1 [ ,-1]

In the previous example, I've been able to calculate the PCA and plot PCA1 vs PCA2 and color the data accordingly, as per row.names = 1, but I'm not sure how to overcome the first initial problem of not using it here.

I have included my data below via dput(head(mockdata))

structure(list(Disease = c(1L, 1L, 0L, 0L, 2L, 2L), Gene1 = c(9104.774619, 
35924.12358, 6.780294688, 1284.690716, 69.50341155, 3935.107345
), Gene2 = c(5224.114486, 35625.73119, 18.35291351, 511.9272679, 
186.7270146, 47611.65544), Gene3 = c(1472.348466, 137571.5525, 
20.78531289, 3019.140256, 146.9615338, 108935.1303), Gene4 = c(2487.124686, 
147604.774, 3.574347972, 1371.576262, 210.6773417, 82831.97458
), Gene5 = c(1872.328747, 235675.6461, 9.834667594, 583.1631957, 
120.6931223, 75874.49936), Gene6 = c(1675.724728, 35931.1852, 
9.91026361, 1634.038443, 58.04818134, 23502.78972), Gene7 = c(3775.885073, 
169672.9921, 5.41305941, 929.2125312, 97.72621248, 46023.7009
), Gene8 = c(5015.202216, 137455.0032, 2.995124554, 1113.882634, 
83.17636201, 14048.19237), Gene9 = c(883.5716868, 45920.44167, 
6.399646876, 892.313155, 117.1104906, 10825.47974), Gene10 = c(1607.790858, 
146627.0588, 1.967559425, 1237.299298, 90.8941744, 32747.04713
), Gene11 = c(2345.478241, 91047.57303, 12.33867961, 663.576224, 
384.5839119, 6692.728154), Gene12 = c(2772.362496, 15511.96753, 
15.64843017, 4143.085461, 169.545757, 22484.03574), Gene13 = c(4131.51741, 
48601.7059, 21.66175797, 2250.0628, 316.0677196, 16612.6508), 
Gene14 = c(1252.440598, 54794.36695, 2.925615978, 708.0342528, 
211.822519, 14021.28425)), row.names = c(NA, 6L), class = "data.frame")
Shaheen

I read about the statistics behind PCA a long time ago. It basically generates df*t(df). If it is transposed, the whole gene can get 14 PCs. Alternatively, you can do this in 6 samples. Didn't know you were interested in comparing genes or diseases?

Now, just change the disease column.

df1 <- df[,-1]

prcomp(df1, scale=TRUE)
Standard deviations (1, .., p=6):
[1] 3.535100e+00 1.197773e+00 2.341240e-01 1.164908e-01 4.575297e-03 1.341079e-16

Rotation (n x k) = (14 x 6):
             PC1         PC2         PC3          PC4         PC5         PC6
Gene1  0.2599087 -0.28372577  0.85593436  0.105415154  0.22237406 -0.03275210
Gene2  0.2217516  0.51374227  0.19750686  0.587702793 -0.28955565 -0.05771744
Gene3  0.2658620  0.28239514 -0.18315564  0.178995551  0.20110401  0.55717953
Gene4  0.2793220  0.12737931 -0.15464656  0.169556323  0.07195134 -0.18818707
Gene5  0.2818335 -0.06260368 -0.17524380  0.066508759  0.19418520 -0.23609300
Gene6  0.2753095  0.19146854 -0.05941163 -0.009035774  0.27267048 -0.02872877
Gene7  0.2804052 -0.10775145 -0.11519116  0.035882068  0.20239393 -0.18183889
Gene8  0.2697234 -0.25139746 -0.04378742 -0.067463492  0.21022545  0.41231175
Gene9  0.2787618 -0.13908918 -0.13407274 -0.103917770 -0.05348686  0.14863994
Gene10 0.2782801 -0.14660837 -0.15870274 -0.039789645  0.20346369 -0.24515625
Gene11 0.2672274 -0.27313184 -0.09102046 -0.087085239 -0.41532869  0.18704109
Gene12 0.2089616  0.55676419  0.20707121 -0.730763260  0.02663728 -0.08109933
Gene13 0.2819621 -0.06202963  0.11256511 -0.133063115 -0.54206507  0.22546812
Gene14 0.2797161 -0.12219139 -0.12034622 -0.025956127 -0.33551757 -0.46422864

For PC across diseases, transpose your matrix:

df2 <- t(df)
colnames(df2) <- df$Disease
df3 <- df2[-1,]
prcomp(df3,scale=TRUE)
Standard deviations (1, .., p=6):
[1] 1.3881805 1.3086775 1.0332001 0.8999575 0.5686278 0.3994426

Rotation (n x k) = (6 x 6):
         PC1        PC2         PC3        PC4        PC5         PC6
1 -0.1761229 -0.4084611  0.42521515 -0.7364243  0.2424742 -0.14218944
1  0.5843042  0.3073572 -0.04722848 -0.2941286  0.4419985  0.52916471
0 -0.5145338  0.4178422  0.04746496 -0.3238892 -0.4356411  0.51353922
0 -0.4042091  0.3652627  0.47093069  0.3634759  0.5908253 -0.01526729
2 -0.3531166  0.1750554 -0.74358399 -0.2618183  0.3973516 -0.25555829
2  0.2734007  0.6324854  0.20003938 -0.2561246 -0.2215796 -0.60868811

Related


how can i draw graph in spring mvc in jsp using highchart

Francisco Gallardo: i want to draw graph in jsp unsando spring mvc and highchart but i still can't do it, i wait for your answer thanks My point, this is where I want to draw the graph, but the graph is not showing, it shows nothing <%@ taglib uri="http://java

How can I draw an energy ranking graph using Python?

twin penguins This is a typical energy ranking used in some published papers, and I am struggling to reproduce one for my data using Python (any matplotlib, sns, etc.). I structure the data in a Pandas dataframe like this: System Color_id Data1 Data2 Form I

How can I fix my python code to draw the graph?

Joseph Tannuri My code exits with code 0, so my code has no errors, but it's still running without drawing the flow on the cylinder I want, can anyone help me with this? import sympy as s import matplotlib.pyplot as plt import numpy as np from

linebacker pc2 --> pc3 --> pc1

XXX duis I have 3 computers pc1, pc2, pc3 This is the topology of my current network Usually I use pc2 to access pc1, the network latency is as follows pc2 ping pc1 500ms pc3 ping pc1 200ms pc2 ping pc3 100ms I want to optimize the above network pc2 --> pc3 -

How can I draw this graph using iris data?

Special case agency Using the iris dataset I want to draw a graph like this: Using viewport(), the scatter plot has a width and height of 0.66 I have two questions: 1.) As you can see in the second plot (right) the lines are smoother, but in the first plot (ri

How can I draw this graph using iris data?

Special case agency Using the iris dataset I want to draw a graph like this: Using viewport(), the scatter plot has a width and height of 0.66 I have two questions: 1.) As you can see in the second plot (right) the lines are smoother, but in the first plot (ri

How can I draw this graph using iris data?

Special case agency Using the iris dataset I want to draw a graph like this: Using viewport(), the scatter plot has a width and height of 0.66 I have two questions: 1.) As you can see in the second plot (right) the lines are smoother, but in the first plot (ri

how can i draw graph in spring mvc in jsp using highchart

Francisco Gallardo: i want to draw graph in jsp unsando spring mvc and highchart but i still can't do it, i wait for your answer thanks My point, this is where I want to draw the graph, but the graph is not showing, it shows nothing <%@ taglib uri="http://java

How can I draw an energy ranking graph using Python?

twin penguins This is a typical energy ranking used in some published papers, and I am struggling to reproduce one for my data using Python (any matplotlib, sns, etc.). I structure the data in a Pandas dataframe like this: System Color_id Data1 Data2 Form I

How can I fix my python code to draw the graph?

Joseph Tannuri My code exits with code 0, so my code has no errors, but it's still running without drawing the flow on the cylinder I want, can anyone help me with this? import sympy as s import matplotlib.pyplot as plt import numpy as np from

How can I draw this graph in Android using XML?

João Correia I'm trying to draw a form in an image in Android XML. But so far my attempts have been unsuccessful. what should I do? Thanks in advance place If you use RelativeLayout, you can make a black rectangle and put a white triangle image on top. I can't

How can I draw this graph using iris data?

Special case agency Using the iris dataset I want to draw a graph like this: Using viewport(), the scatter plot has a width and height of 0.66 I have two questions: 1.) As you can see in the second plot (right) the lines are smoother, but in the first plot (ri

How can I draw this graph with 2 loops and 1 if statement?

revenge How can I draw such a graph in C with 2 for loops and 1 if statement? AAAAA AAAAB AAABB AAAAB AAAAA I know I have to compare rows and columns, but I don't know how to get accurate results. It must work for every number greater than 0 and less than 24.

How can I draw this graph with 2 loops and 1 if statement?

revenge How can I draw such a graph with 2 for loops and 1 if statement? AAAAA AAAAB AAABB AAAAB AAAAA I know I have to compare rows and columns, but I don't know how to get accurate results. It must work for every number greater than 0 and less than 24. What

Unable to grant access to PC2 to Git repository using PC1 SSH key

Gustavo Brar I have a PC1 and I can access the repository $ git cloneusing an SSH connection (the command works fine). I've also tried giving that repository access to PC2, so I've done copying the key and copying id_rsafrom PC1 id_rsa.pubto my PC2 ~/.sshfolde

How can I fix my python code to draw the graph?

Joseph Tannuri My code exits with code 0, so my code has no errors, but it's still running without drawing the flow on the cylinder I want, can anyone help me with this? import sympy as s import matplotlib.pyplot as plt import numpy as np from

How to get date and time from PC2 to PC1?

Certification Hi, I'm having trouble with some problems, I searched some articles but there are some examples PHP. As my title above, how do I get information Date and Timefrom my - PC2(Server side where I publish my project) (example of date and time is 8/12/

How can I fix my python code to draw the graph?

Joseph Tannuri My code exits with code 0, so my code has no errors, but it's still running without drawing the flow on the cylinder I want, can anyone help me with this? import sympy as s import matplotlib.pyplot as plt import numpy as np from

How can I fix my python code to draw the graph?

Joseph Tannuri My code exits with code 0, so my code has no errors, but it's still running without drawing the flow on the cylinder I want, can anyone help me with this? import sympy as s import matplotlib.pyplot as plt import numpy as np from

How can I draw this graph with 2 loops and 1 if statement?

revenge How can I draw such a graph in C with 2 for loops and 1 if statement? AAAAA AAAAB AAABB AAAAB AAAAA I know I have to compare rows and columns, but I don't know how to get accurate results. It must work for every number greater than 0 and less than 24.

Unable to grant access to PC2 to Git repository using PC1 SSH key

Gustavo Brar I have a PC1 and I can access the repository $ git cloneusing an SSH connection (the command works fine). I've also tried giving that repository access to PC2, so I've done copying the key and copying id_rsafrom PC1 id_rsa.pubto my PC2 ~/.sshfolde