How to use PC (generated by PCA) on a dataset in R?


Alavind Rajan

I am an R learner. I'm working on a "Human Activity Recognition" dataset from the internet. It has 563 variables, the last variable is the class variable "activity" that must be predicted.

I am trying to use the KNN algorithm from R's CARET package.

I created another dataset with 561 numeric variables, excluding the last 2 - subject and activity.

I ran PCA on this and decided to use the first 20 PCs.

pca1 <- prcomp(human2, scale = TRUE)

I keep the data for these PCs in another dataset called "newdat"

newdat <- pca1$x[ ,1:20]

Now I am trying to run the following code: but it is giving me error because this newdat doesn't have my class variable

trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
                 trControl=trctrl,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

I'm trying to extract the last column "activity" from the raw data and append it using cbind() with "newdat" to use it on knn-fit (above), but it's not getting appended.

Any suggestions how to use the computer?


Here is the code:

human1 <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/train1.csv", header = TRUE)
humant <- read.csv("C:/NIIT/Term 2/Prog for Analytics II/human-activity-recognition-with-smartphones (1)/test1.csv", header = TRUE)

#taking the predictor columns
human2 <- human1[ ,1:561]


pca1 <- prcomp(human2, scale = TRUE)
newdat <- pca1$x[ ,1:15]
newdat <- cbind(newdat, Activity = as.character(human1$Activity))

pca1 <- preProcess(human1[,1:561], 
                   method=c("BoxCox", "center", 
                            "scale", "pca"))
PC = predict(pca1, human1[,1:561])


trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
set.seed(3333)
knn_fit <- train(Activity ~., data = newdat, method = "knn",
                 trControl=trctrl,
                 preProcess = c("center", "scale"),
                 tuneLength = 10)

#applying knn_fit to test data

test_pred <- predict(knn_fit, newdata = testing)
test_pred

#checking the prediction
confusionMatrix(test_pred, testing$V1 )

I am getting an error in the part below. I have attached the error:

> knn_fit <- train(Activity ~., data = newdat, method = "knn",
+                  trControl=trctrl,
+                  preProcess = c("center", "scale"),
+                  tuneLength = 10)
Error: cannot allocate vector of size 1.3 Gb
Manuel Bicker

How are you trying to cbind the column, can you please show the code? I think you just stepped in StringsAsFactors = TRUE. Did the following line solve your problem:

#...
#newdat <- pca1$x[ ,1:20]    
newdat <- cbind(newdat, Activity = as.character(human2$Activity))

Related


How to do PCA for each group of a dataset with multiple groups?

Keith W. Larson I have a dataset from four populations, four treatments and three replicates. There is only one combination of population, treatment and repetition per person. I have measured each of them four times. I would like to perform PCA of these measur

How to do PCA for each group of a dataset with multiple groups?

Keith W. Larson I have a dataset from four populations, four treatments and three replicates. There is only one combination of population, treatment and repetition per person. I have measured each of them four times. I would like to perform PCA of these measur

How to do PCA for each group of a dataset with multiple groups?

Keith W. Larson I have a dataset from four populations, four treatments and three replicates. There is only one combination of population, treatment and repetition per person. I have measured each of them four times. I would like to perform PCA of these measur

How to do PCA for each group of a dataset with multiple groups?

Keith W. Larson I have a dataset from four populations, four treatments and three replicates. There is only one combination of population, treatment and repetition per person. I have measured each of them four times. I would like to perform PCA of these measur

Unable to run PCA on dataset

Pravda I'm trying to run PCA on loan dataset - find test here and train . The code snippet is as follows, from sklearn.decomposition import PCA pca = PCA(n_components = 2) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test) explained_variance =

How to get data generated on PC to web server

dealer. I'm working on a project that generates data on a pc (using C++) and then has to send it to a http server (running on xampp now). The resulting data must be sent every 1ms (this is a requirement) and then streamed to the user on the http server mention