Subset a dataframe based on another dataframe with multiple conditions


Lin Silin

I have a list of methylation array dataframes called betatable:

sample_A sample_B ... chr    position
0.5      0.3          chr1   75939
0.3      0.6          chr2   11195
...

I want to subset the above dataframe and generate another dataframe based on a specific condition of chr and location range. For this I have another set of data genes_pos:

gene   chr    range_lower   range_upper
ABC    chr1   34959         69593
...

I was thinking of using lapplybut couldn't figure it out. Thanks in advance.

Matt W.

In this example you can use dplyr::inner_join

Reproducible example:

set.seed(123)
x <- data.frame(x = sample(1:100, 100, replace = TRUE), y = sample(1:100, 100, replace = TRUE), chr = sample(c("chr1", "chr2", "chr3"), 100, replace = T), Position = sample(1:10000, 100, replace = TRUE))
genes <- data.frame(gene = c("gene1", "gene2", "gene3"), chr = c("chr1", "chr2", "chr3"), rangelower = c(1, 3000, 6000), rangeupper = c(2999, 5999, 10001))

Inner join , then filter by upper and lower bounds

library(dplyr)

new_df <- x %>% 
               inner_join(genes, by = "chr") %>% 
               filter(Position < rangeupper, Position > rangelower)

View Results:

> head(new_df)
    x  y  chr Position  gene rangelower rangeupper
1  90 61 chr1       83 gene1          1       2999
2  96 94 chr2     3896 gene2       3000       5999
3  90 15 chr3     8029 gene3       6000      10001
4  96 41 chr3     8569 gene3       6000      10001
5 100 22 chr3     7040 gene3       6000      10001
6  66 37 chr1     1039 gene1          1       2999 

Then we can split the dataframe by genes.

list_dfs <- split(new_df, new_df$gene)

Related


Subset a dataframe based on a list in another dataframe

Jane Harger I have three data.frames, each with a name called col. I want to filter out all rows DataFrame1that have values colsuch as a, band cappear din DataFrame2. For this I am using subset(DataFrame1, !(col %in% DataFrame2$col)) DataFrame3Contains a list

Subset a dataframe based on a list in another dataframe

Jane Harger I have three data.frames, each with a name called col. I want to filter out all rows DataFrame1that have values colsuch as a, band cappear din DataFrame2. For this I am using subset(DataFrame1, !(col %in% DataFrame2$col)) DataFrame3Contains a list

Subset one dataframe based on values in another dataframe

Ashley Thomas Sorry, I'm an absolute beginner so have some very basic questions! I have a very large dataset that lists individual transactions for a household. Examples are as follows. # hh_id trans_type transaction_value # 1 hh1 food

Loop through a subset of the dataframe based on two conditions

Parker I have the following problem: I need to run each subset of a dataframe - based on the value of a variable - to create a new entry for another variable based on 2 conditions. The dataframe (dt3) is as follows: I have 4 variables (birth_year, surname –Nam

Loop through a subset of the dataframe based on two conditions

Parker I have the following problem: I need to run each subset of a dataframe - based on the value of a variable - to create a new entry for another variable based on 2 conditions. The dataframe (dt3) is as follows: I have 4 variables (birth_year, surname –Nam