Manipulate row values of subset dataframe in Pandas


abon

I have a dataframe df1like this :

    ID1    ID2
0   foo    bar
1   fizz   buzz

another df2like this:

    ID1    ID2    Count    Code   
0   abc    def      7        B
1   fizz   buzz     5        B
2   fizz1  buzz2    9        C
3   foo    bar      6        B
4   foo    bar      6        Z

What I want to do is filter ID1and ID2match a row in the first dataframe as one dataframe in the second dataframe sub_df, and then apply the following code sub_df.loc[sub_df["Count"] >= 5, "Code"] = "A"to sub_df:

sub_df:

    ID1    ID2    Count    Code   
1   fizz   buzz     5        B
3   foo    bar      6        B
4   foo    bar      6        Z

In the end, I want to produce a dataframe dfthat looks like this :

    ID1    ID2    Count    Code   
0   abc    def      7        B
1   fizz   buzz     5        A
2   fizz1  buzz2    9        C
3   foo    bar      6        A
4   foo    bar      6        A

what should I do? thanks.

Jesler

You can use and combine combinations between two columns to test for membership :Series.isinID1ID2Series.str.cat

id2 = df2['ID1'].str.cat(df2['ID2'], sep='_')
id1 = df1['ID1'].str.cat(df1['ID2'], sep='_')

df2.loc[(df2["Count"] >= 5) & id2.isin(id1), "Code"] = "A" 
print (df2)
     ID1    ID2  Count Code
0    abc    def      7    B
1   fizz   buzz      5    A
2  fizz1  buzz2      9    C
3    foo    bar      6    A
4    foo    bar      6    A

edit:

After testing it works fine for me:

print (df1)
    ID1   ID2
0   foo   bar
1  fizz  buzz

print (df2)
     ID1    ID2        date  price
0    abc    def  2019-08-01      1
1   fizz   buzz  2019-08-02      2
2  fizz1  buzz2  2019-08-02      3
3    foo    bar  2019-08-03      4
4    foo    bar  2019-08-01      5

df2["date"] = pd.to_datetime(df2["date"])
df2.loc[(df2["date"] != '2019-08-01') & (df2['ID1'].isin(df1['ID1'])), "price"] = np.nan, 
print (df2)
     ID1    ID2       date  price
0    abc    def 2019-08-01    1.0
1   fizz   buzz 2019-08-02    NaN <- set NaN beacuse id
2  fizz1  buzz2 2019-08-02    3.0
3    foo    bar 2019-08-03    NaN <- set NaN beacuse id
4    foo    bar 2019-08-01    5.0 <- not set NaN beacuse id but 2019-08-01

Related


Subset pandas dataframe based on certain row values

calm down I have a Pandas dataframe where the columns are "dynamic" (meaning I don't know the names of the columns until I retrieve the data from various databases). The dataframe is a single row and looks like this: Make Date Red Blue Gre

Subset a dataframe by row values of another dataframe

newbie I am trying to subset a dataframe using data extracted from another dataframe. My data is basically like this: PriceData Date AAPL BAC CAT JNJ PG UNH VZ 2004-04-26 2004-04-27 Daily Prices 2004-04-28 2004-04-

Manipulate pandas dataframe based on values in column

Jesh Kundham I have a pandas dataframe as below with four columns. How to trim the dataset based on the values in the fourth column. The fourth column header is "isValid" enter: X Y I isValid -60.3 -15.63 25 1 -60.2 -15.63 10 1 -60.1

Subset pandas DataFrame by DataFrame if values match

Nils I think the easiest way to explain what I'm trying to do is to show an example: Given a DataFrame V_set V_reset I_set I_reset HRS LRS ID 0 0.599417 -0.658417 0.000021 -0.000606 84562.252849 1097.226787 1383.

Subset pandas DataFrame by DataFrame if values match

Nils I think the easiest way to explain what I'm trying to do is to show an example: Given a DataFrame V_set V_reset I_set I_reset HRS LRS ID 0 0.599417 -0.658417 0.000021 -0.000606 84562.252849 1097.226787 1383.

Subset pandas DataFrame by DataFrame if values match

Nils I think the easiest way to explain what I'm trying to do is to show an example: Given a DataFrame V_set V_reset I_set I_reset HRS LRS ID 0 0.599417 -0.658417 0.000021 -0.000606 84562.252849 1097.226787 1383.

Subset pandas DataFrame by DataFrame if values match

Nils I think the easiest way to explain what I'm trying to do is to show an example: Given a DataFrame V_set V_reset I_set I_reset HRS LRS ID 0 0.599417 -0.658417 0.000021 -0.000606 84562.252849 1097.226787 1383.

Subset Pandas DataFrame secondary index and reassign values

AJG519 This may be a two-part question, but I'm looking for the best way to rescale (or whatever) the subset of records identified by its secondary index. For example - say I have the following dataframe: >>> df=pd.DataFrame(data=[[1,2,3],[.4,.5,.6],[7,8,9],[.

Subset Pandas DataFrame secondary index and reassign values

AJG519 This may be a two-part question, but I'm looking for the best way to rescale (or whatever) the subset of records identified by its secondary index. For example - say I have the following dataframe: >>> df=pd.DataFrame(data=[[1,2,3],[.4,.5,.6],[7,8,9],[.

Assign new values to subset of Pandas DataFrame

alex martin I am using the following line of code to change all values containing 'GLC' to only 'GLC' for a subset of pandas dataframes. xx.loc[xx['Brand'] == 'MERCEDES','Model'][xx[xx['Brand'] == 'MERCEDES']['Model'].str.contains('GLC',case=False)] = 'GLC'

Find values as subset of rows in Pandas dataframe

SL glider This is a follow-up solution/question to one of my other questions: Python Pandas compare two dataframes to assign countries to phone numbers We have two dataframes: df1 = pd.DataFrame({"TEL": ["49123410", "49123411","49123412","49123413","49123414",

Subset Pandas DataFrame secondary index and reassign values

AJG519 This may be a two-part question, but I'm looking for the best way to rescale (or whatever) the subset of records identified by its secondary index. For example - say I have the following dataframe: >>> df=pd.DataFrame(data=[[1,2,3],[.4,.5,.6],[7,8,9],[.

Assign new values to subset of Pandas DataFrame

alex martin I am using the following line of code to change all values containing 'GLC' to only 'GLC' for a subset of pandas dataframes. xx.loc[xx['Brand'] == 'MERCEDES','Model'][xx[xx['Brand'] == 'MERCEDES']['Model'].str.contains('GLC',case=False)] = 'GLC'

Select subset of pandas dataframe using index values

night man Use a DataFrame like this: temp = pd.DataFrame({'a':[1,4,7],'b':[2,5,8],'c':[3,6,9]}).T.rename(columns={0:'first_a',1:'first_b',2:'second'}) first_a first_b second a 1 4 7 b 2 5 8 c 3 6 9 If we just want to subset and use first*co

Assign new values to subset of Pandas DataFrame

alex martin I am using the following line of code to change all values containing 'GLC' to only 'GLC' for a subset of pandas dataframes. xx.loc[xx['Brand'] == 'MERCEDES','Model'][xx[xx['Brand'] == 'MERCEDES']['Model'].str.contains('GLC',case=False)] = 'GLC'

Subset Pandas DataFrame secondary index and reassign values

AJG519 This may be a two-part question, but I'm looking for the best way to rescale (or whatever) the subset of records identified by its secondary index. For example - say I have the following dataframe: >>> df=pd.DataFrame(data=[[1,2,3],[.4,.5,.6],[7,8,9],[.

Subset Pandas DataFrame secondary index and reassign values

AJG519 This may be a two-part question, but I'm looking for the best way to rescale (or whatever) the subset of records identified by its secondary index. For example - say I have the following dataframe: >>> df=pd.DataFrame(data=[[1,2,3],[.4,.5,.6],[7,8,9],[.

Assign new values to subset of Pandas DataFrame

alex martin I am using the following line of code to change all values containing 'GLC' to only 'GLC' for a subset of pandas dataframes. xx.loc[xx['Brand'] == 'MERCEDES','Model'][xx[xx['Brand'] == 'MERCEDES']['Model'].str.contains('GLC',case=False)] = 'GLC'

Assign new values to subset of Pandas DataFrame

alex martin I am using the following line of code to change all values containing 'GLC' to only 'GLC' for a subset of pandas dataframes. xx.loc[xx['Brand'] == 'MERCEDES','Model'][xx[xx['Brand'] == 'MERCEDES']['Model'].str.contains('GLC',case=False)] = 'GLC'

Compare row values in pandas dataframe

s_boardman I have data in a pandas dataframe with two columns containing sequences of numbers (start and stop). I want to determine which row's end value overlaps the next row's start value. Then I need to concatenate them into a row so that in each row there