Subset of columns in a dataframe based on another dataframe/list


BCArg:

I have the following table1which is a dataframe consisting of 6 columns and 8083 rows. Below, I show this header table1:

|gene ID        |   prom_65|   prom_66|  amast_69|  amast_70|   p_value|
|:--------------|---------:|---------:|---------:|---------:|---------:|
|LdBPK_321470.1 |   24.7361|   25.2550|   31.2974|   45.4209| 0.2997430|
|LdBPK_251900.1 |  107.3580|  112.9870|   77.4182|   86.3211| 0.0367792|
|LdBPK_331430.1 |   72.0639|   86.1486|   68.5747|   77.8383| 0.2469355|
|LdBPK_100640.1 |   43.8766|   53.4004|   34.0255|   38.4038| 0.1299948|
|LdBPK_330360.1 | 2382.8700| 1871.9300| 2013.4200| 2482.0600| 0.8466225|
|LdBPK_090870.1 |   49.6488|   53.7134|   59.1175|   66.0931| 0.0843242|

I have another dataframe called accessions40a list of 510 gene IDs. It is a subset of the first column, table1i.e. all its values ​​(510) are contained in the first column of table1(8083) . The title accessions40appears as follows:

|V1             |
|:--------------|
|LdBPK_330360.1 |
|LdBPK_283000.1 |
|LdBPK_360210.1 |
|LdBPK_261550.1 |
|LdBPK_367320.1 |
|LdBPK_361420.1 |

What I'm trying to do is the following: I want to generate a new content table2that contains under the first column (gene ID) only the values accessions40​​in and the corresponding values ​​from the other five columns in table1. In other words, I want to subset my first column table1based on the value accessions40.

akrun

We can %in%use to get the logical vector and subsetthe rows of "table1" based on that.

subset(table1, gene_ID %in% accessions40$V1)

A better option isdata.table

library(data.table)
setDT(table1)[gene_ID %chin% accessions40$V1]

or use filterfromdplyr

library(dplyr)
table1 %>%
      filter(gene_ID %in% accessions40$V1)

Related


Subset a dataframe based on a list in another dataframe

Jane Harger I have three data.frames, each with a name called col. I want to filter out all rows DataFrame1that have values colsuch as a, band cappear din DataFrame2. For this I am using subset(DataFrame1, !(col %in% DataFrame2$col)) DataFrame3Contains a list

Subset/select columns from a dataframe based on another dataframe

Alex I would like to select columns from the dataframe based on their name with the dfhelp of another dataframe dfkey(but not necessary i.e. can be converted to a list etc.) which acts as some kind of key and has some column names dfstored. Example below: df <

Subset/select columns from a dataframe based on another dataframe

Alex I would like to select columns from the dataframe based on their name with the dfhelp of another dataframe dfkey(but not necessary i.e. can be converted to a list etc.) which acts as some kind of key and has some column names dfstored. Example below: df <

Subset a dataframe based on a list in another dataframe

Jane Harger I have three data.frames, each with a name called col. I want to filter out all rows DataFrame1that have values colsuch as a, band cappear din DataFrame2. For this I am using subset(DataFrame1, !(col %in% DataFrame2$col)) DataFrame3Contains a list