How to use PC (generated by PCA) on a dataset in R?

Alavind Rajan

I am an R learner. I'm working on a "Human Activity Recognition" dataset from the internet. It has 563 variables, the last variable is the class variable "activity" that must be predicted.

I am trying to use the KNN algorithm from R's CARET package.

I created another dataset with 561 numeric variables, excluding the last 2 - subject and activity.

I ran PCA on this and decided to use the first 20 PCs.

pca1 <- prcomp(human2, scale = TRUE)

I keep the data for these PCs in another dataset called "newdat"

newdat <- pca1$x[ ,1:20]

Now I am trying to run the following code: but it is giving me error because this newdat doesn't have my class variable

trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
                 trControl=trctrl,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

I'm trying to extract the last column "activity" from the raw data and append it using cbind() with "newdat" to use it on knn-fit (above), but it's not getting appended.

Any suggestions how to use the computer?

Here is the code:

human1 <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/train1.csv", header = TRUE)
humant <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/test1.csv", header = TRUE)

#taking the predictor columns
human2 <- human1[ ,1:561]


pca1 <- prcomp(human2, scale = TRUE)
newdat <- pca1$x[ ,1:15]
newdat <- cbind(newdat, Activity = as.character(human1$Activity))

pca1 <- preProcess(human1[,1:561], 
                   method=c("BoxCox", "center", 
                            "scale", "pca"))
PC = predict(pca1, human1[,1:561])


trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
                 trControl=trctrl,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

#applying knn_fit to test data

test_pred <- predict(knn_fit, newdata = testing)
test_pred

#checking the prediction
confusionMatrix(test_pred, testing$V1 )

I am getting an error in the part below. I have attached the error:

> knn_fit <- train(Activity ~., data = newdat, method = "knn",
+                  trControl=trctrl,
+                  preProcess = c("center", "scale"),
+                  tuneLength = 10)
Error: cannot allocate vector of size 1.3 Gb

Manuel Bicker

How are you trying to cbind the column, can you please show the code? I think you just stepped in StringsAsFactors = TRUE. Did the following line solve your problem:

#...
#newdat <- pca1$x[ ,1:20]    
newdat <- cbind(newdat, Activity = as.character(human2$Activity))

PCA in R: How to determine the contribution of each variable to the PC score

Ringer Barker I am performing PCA in R as shown below. # Load data data(mtcars) # Run PCA car.pca <- prcomp(mtcars, scale = TRUE, center = TRUE) I get a PC score for each car by using car.pca$x. So, for example, I know that for a Mazda RX4, the PC1 value is

PCA in R: How to determine the contribution of each variable to the PC score

Save large dataset PCA on disk for later use in limited disk space

Spore 234 I have a very large dataset (numpy array) that I can perform PCA on to reduce dimensionality. The dataset is called train_data. I use scikit-learn and do it like this pca = PCA(n_components=1000, svd_solver='randomized') pca.fit() smaller_data = pca.

Save large dataset PCA on disk for later use in limited disk space

Compute PCA of a dataset with matlab and R, but with different variance for each component

Adnan Hussain I am trying to visualize 5 featured datasets using PCA. I use both matlab and R. In R I use the prcomp() command and in matlab I use the pca() command. Both use SVD to get the principal components, but I get huge differences in each principal com

(R) Using PCA (ggbiplot) to visualize a dataset with a large number of variables

User 4704857 My dataset has 100 samples and 17000 variables. I will use PCA and visualize the data. But the problem is that the plot is not good. How can I control the number of arrows in ggbiplotOR biplotand actually choose the variable that contributes the m

How to do PCA for each group of a dataset with multiple groups?

Keith W. Larson I have a dataset from four populations, four treatments and three replicates. There is only one combination of population, treatment and repetition per person. I have measured each of them four times. I would like to perform PCA of these measur

How to do PCA for each group of a dataset with multiple groups?

Unable to run PCA on dataset

Pravda I'm trying to run PCA on loan dataset - find test here and train . The code snippet is as follows, from sklearn.decomposition import PCA pca = PCA(n_components = 2) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test) explained_variance =

How to get data generated on PC to web server

dealer. I'm working on a project that generates data on a pc (using C++) and then has to send it to a http server (running on xampp now). The resulting data must be sent every 1ms (this is a requirement) and then streamed to the user on the http server mention

How to use dplyr in R to aggregate by group and get a summary of the overall dataset

Ron Jacques Hamilton I would like to compute summaries for different groups and also compute summaries for the entire (ungrouped) dataset at the same time, preferably using dplyr (or something very suitable for dplyr pipelines). The desired result can be obtai

How to use an "unexported" dataset from an R package in another package

cementation I am using a package wpp2019that contains many demographic datasets . I want to be able to use these datasets in some functions in my package. Unfortunately, these datasets cannot be referenced with ( getor wpp2019::) , but only by data. Since the

How to select certain rows of a dataset in R and then use in a function?

Daisy Beats I'm trying to find the Mahalanobis distance between different species irisin R. I can find the distance between the datasets setosawith the versicolorfollowing code: library(HDMD) #To get Mahalanobis distances between Setosa and Versicolor, set.ve

How to use math when subsetting in R without creating a new dataset

PythonDabble I would like to take the mean difference without creating a new dataset, but just subset as I go. this is my attempt temp <- c("low","low","med","med","low","low","med","med") species <- c("A","B","A","B","A","B","A","B") abundance <- c(1,2,1,2,3,

How can I use the same code for different variables in an R dataset?

Maria Camila Urego I'm working on a supervised model for email classification to classify emails into 20 different groups, I've done the model for the first group (G1) (a very large code) and I'm wondering if there is some function that can Repeat the code, bu

How can I use the same code for different variables in an R dataset?

How to use dplyr in R to aggregate by group and get a summary of the overall dataset

How to use an "unexported" dataset from an R package in another package

How to select certain rows of a dataset in R and then use in a function?

How to use dplyr in R to aggregate by group and get a summary of the overall dataset

Ron Jacques Hamilton I want to compute summaries for different groups and simultaneously compute summaries for the entire (ungrouped) dataset, preferably using dplyr (or something very suitable for dplyr pipelines). The desired result can be obtained by calcul

How to use PC (generated by PCA) on a dataset in R?

Related

PCA in R: How to determine the contribution of each variable to the PC score

PCA in R: How to determine the contribution of each variable to the PC score

PCA in R: How to determine the contribution of each variable to the PC score

PCA in R: How to determine the contribution of each variable to the PC score

PCA in R: How to determine the contribution of each variable to the PC score

Save large dataset PCA on disk for later use in limited disk space

Save large dataset PCA on disk for later use in limited disk space

Save large dataset PCA on disk for later use in limited disk space

Save large dataset PCA on disk for later use in limited disk space

Save large dataset PCA on disk for later use in limited disk space

Compute PCA of a dataset with matlab and R, but with different variance for each component

(R) Using PCA (ggbiplot) to visualize a dataset with a large number of variables

How to do PCA for each group of a dataset with multiple groups?

How to do PCA for each group of a dataset with multiple groups?

How to do PCA for each group of a dataset with multiple groups?

How to do PCA for each group of a dataset with multiple groups?

Unable to run PCA on dataset

How to get data generated on PC to web server

How to use dplyr in R to aggregate by group and get a summary of the overall dataset

How to use an "unexported" dataset from an R package in another package

How to select certain rows of a dataset in R and then use in a function?

How to use math when subsetting in R without creating a new dataset

How can I use the same code for different variables in an R dataset?

How can I use the same code for different variables in an R dataset?

How to use dplyr in R to aggregate by group and get a summary of the overall dataset

How to use an "unexported" dataset from an R package in another package

How to select certain rows of a dataset in R and then use in a function?

How to use dplyr in R to aggregate by group and get a summary of the overall dataset

How to select certain rows of a dataset in R and then use in a function?

Ranking