Manipulate row values of subset dataframe in Pandas
abon
I have a dataframe df1
like this :
ID1 ID2
0 foo bar
1 fizz buzz
another df2
like this:
ID1 ID2 Count Code
0 abc def 7 B
1 fizz buzz 5 B
2 fizz1 buzz2 9 C
3 foo bar 6 B
4 foo bar 6 Z
What I want to do is filter ID1
and ID2
match a row in the first dataframe as one dataframe in the second dataframe sub_df
, and then apply the following code sub_df.loc[sub_df["Count"] >= 5, "Code"] = "A"
to sub_df
:
sub_df:
ID1 ID2 Count Code
1 fizz buzz 5 B
3 foo bar 6 B
4 foo bar 6 Z
In the end, I want to produce a dataframe df
that looks like this :
ID1 ID2 Count Code
0 abc def 7 B
1 fizz buzz 5 A
2 fizz1 buzz2 9 C
3 foo bar 6 A
4 foo bar 6 A
what should I do? thanks.
Jesler
You can use and combine combinations between two columns to test for membership :Series.isin
ID1
ID2
Series.str.cat
id2 = df2['ID1'].str.cat(df2['ID2'], sep='_')
id1 = df1['ID1'].str.cat(df1['ID2'], sep='_')
df2.loc[(df2["Count"] >= 5) & id2.isin(id1), "Code"] = "A"
print (df2)
ID1 ID2 Count Code
0 abc def 7 B
1 fizz buzz 5 A
2 fizz1 buzz2 9 C
3 foo bar 6 A
4 foo bar 6 A
edit:
After testing it works fine for me:
print (df1)
ID1 ID2
0 foo bar
1 fizz buzz
print (df2)
ID1 ID2 date price
0 abc def 2019-08-01 1
1 fizz buzz 2019-08-02 2
2 fizz1 buzz2 2019-08-02 3
3 foo bar 2019-08-03 4
4 foo bar 2019-08-01 5
df2["date"] = pd.to_datetime(df2["date"])
df2.loc[(df2["date"] != '2019-08-01') & (df2['ID1'].isin(df1['ID1'])), "price"] = np.nan,
print (df2)
ID1 ID2 date price
0 abc def 2019-08-01 1.0
1 fizz buzz 2019-08-02 NaN <- set NaN beacuse id
2 fizz1 buzz2 2019-08-02 3.0
3 foo bar 2019-08-03 NaN <- set NaN beacuse id
4 foo bar 2019-08-01 5.0 <- not set NaN beacuse id but 2019-08-01