Subset pandas dataframe based on & operation on columns from another dataframe


ruby

I have two different datasets. Based on column values in these 2 data framesthe conditions of the new dataset I want to create.

d1=pd.DataFrame({'ID':[57,58,59,68,61],'Period':['Day_3','Day_4','Day_5','Day_3','Day_2'],'pay':[1000,3000,2000,1000,5000]})
d2=pd.DataFrame({'ID':[68,58,59,42],'Period':['Day_1','Day_8','Day_9','Day_6'],'pay':[10000,30000,20000,10000]})

d1andd2

    ID  Period  pay                        ID   Period  pay 

0   57  Day_3   1000                   0   68   Day_1   10000
1   58  Day_4   3000                   1   58   Day_8   30000
2   59  Day_5   2000                   2   59   Day_9   20000
3   68  Day_3   1000                   3   42   Day_6   10000
4   61  Day_2   5000

tempIf these conditions hold, will be the subsettemp=d1[d1.ID.isin(d2.ID) & d1['Period']<=d2['Period']]

d1[d1.ID.isin(d2.ID)gives partial results and d1['Period']<=d2['Period']]this throws the error, ValueError: Can only compare identically-labeled Series objectsI extracted the numbers from the days and stored the value as a day_numbers column and executed the code above and I got the same error.

I need the result to be

   ID   Period  pay
0  58   Day_4   3000
1  59   Day_5   2000

How to get this result?
  
Shubham Sharma

DataFrame.mergeUse on a column and create a boolean from the numeric part of the comparison , IDthen use this mask to filter the rows:maskPeriod

df = d1.merge(d2[['ID', 'Period']], on='ID', suffixes=['', '_r'])
mask = (
    df['Period'].str.split('_').str[-1].astype(int) <=
    df['Period_r'].str.split('_').str[-1].astype(int)
)
df = df[mask].drop('Period_r', 1)

result:

print(df)

   ID Period   pay
0  58  Day_4  3000
1  59  Day_5  2000

Related


Subset/select columns from a dataframe based on another dataframe

Alex I would like to select columns from the dataframe based on their name with the dfhelp of another dataframe dfkey(but not necessary i.e. can be converted to a list etc.) which acts as some kind of key and has some column names dfstored. Example below: df <

Subset/select columns from a dataframe based on another dataframe

Alex I would like to select columns from the dataframe based on their name with the dfhelp of another dataframe dfkey(but not necessary i.e. can be converted to a list etc.) which acts as some kind of key and has some column names dfstored. Example below: df <