Pandas fillna with lookup table


Zambi

Having some trouble filling in NaNs. I want to have a dataframe column with several NaNs and populate them with values ​​derived from a "lookup table" based on the values ​​in another column. (You might recognize my data from the Titanic dataset)...

    Pclass   Age
0   1        33
1   3        24
2   1        23
3   2        NaN
4   1        Nan

I want to fill NaN with values ​​from 'pclass_lookup' series:

pclass_lookup
1        38.1
2        29.4
3        25.2

I've tried fillna with index:

df.Age.fillna(pclass_lookup[df.Pclass]), but it gives me an error of 
    ValueError: cannot reindex from a duplicate axis

Lambda can also try:

df.Age.map(lambda x: x if x else pclass_lookup[df.Pclass]

However, this doesn't seem to solve the problem either. Am I totally missing the boat here? '

Ed Chum

First, you have a duff value on line 4, you actually have a string 'Nan' that is not the same as 'NaN', so even if your code does work, the value is never replaced.

So you need to replace that duff value, then you can call map to perform a lookup on the NaNvalue :

In [317]:

df.Age.replace('Nan', np.NaN, inplace=True)
df.loc[df['Age'].isnull(),'Age'] = df['Pclass'].map(df1.pclass_lookup)
df
Out[317]:
   Pclass   Age
0       1    33
1       3    24
2       1    23
3       2  29.4
4       1  38.1

opportunity

For a df with 5000 rows:

In [26]:

%timeit df.loc[df['Age'].isnull(),'Age'] = df['Pclass'].map(df1.pclass_lookup)
100 loops, best of 3: 2.41 ms per loop
In [27]:

%%timeit
def remove_na(x):
    if pd.isnull(x['Age']):
        return df1[x['Pclass']]
    else:
        return x['Age']
df['Age'] =df.apply(remove_na, axis=1)
1 loops, best of 3: 278 ms per loop
In [28]:

%%timeit
nulls = df.loc[df.Age.isnull(), 'Pclass']
df.loc[df.Age.isnull(), 'Age'] = df1.loc[nulls].values
100 loops, best of 3: 3.37 ms per loop

So you see that applies here because it is less efficient at iteratively scaling row-wise than the other two methods of vectorization, but mapis still the fastest.

Related


Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with several NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pcla

Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with a few NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pclass

Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with several NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pcla

Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with several NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pcla

Using pandas dataframe as lookup table

clstaudt Given a row in a dataframe, what is the most efficient way to Xretrieve all rows from the dataframe that Yexactly match the query row ? Example: [0,1,0,1]From query line [ [0,1,0,1, 1.0], [0,1,0,1, 2.0], [0,1,0,0, 3.0], [1,1,0,0, 0.5], ] should c

Python Pandas: DataFrame as lookup table

Fives This is a preprocessed DataFrame with columns representing the frequency and success values for a particular column. For example: Columns are associated Awith FREQ_Aand respectively SUCCESS_A. A B Gold FREQ_A SUCCESS_A FREQ_B SUCCESS_B 0 1 B

Using pandas dataframe as lookup table

clstaudt Given a row in a dataframe, what is the most efficient way to Xretrieve all rows from the dataframe that Yexactly match the query row ? Example: [0,1,0,1]From query line [ [0,1,0,1, 1.0], [0,1,0,1, 2.0], [0,1,0,0, 3.0], [1,1,0,0, 0.5], ] should c

Using pandas dataframe as lookup table

clstaudt Given a row in a dataframe, what is the most efficient way to Xretrieve all rows from the dataframe that Yexactly match the query row ? Example: [0,1,0,1]From query line [ [0,1,0,1, 1.0], [0,1,0,1, 2.0], [0,1,0,0, 3.0], [1,1,0,0, 0.5], ] should c

Pandas DataFrame as lookup table with embedded lists

Neil: I have a dataframe with the following structure: A B [1, 2, 3] [a, b, c] [4, 5, 6] [d, e, f] I want to query the dataframe so it 1returns as it is entered [a,b,c]. Again, the query 6should return [d, e, f]. What is the most readabl

Pandas use melt to create lookup table

Zanan I have a dataframe dfsize of 24x13 that is displayed as (I am showing a truncated version of the 24x13 array representing 12 months and 24 hours): HE 1 2 3 4 0 1 1.8 2.5 3.5 8.5 1 2 2.6 2.9 4.3 8.7 2 3 4.4 2

Pandas DataFrame as lookup table with embedded lists

Neil: I have a dataframe with the following structure: A B [1, 2, 3] [a, b, c] [4, 5, 6] [d, e, f] I want to query the dataframe so it 1returns as it is entered [a,b,c]. Again, the query 6should return [d, e, f]. What is the most readabl

Pandas lookup value in a range in another table

Bath In the example below, I am trying to check if the "value" in table 1 is within the range of values for the "start" and "stop" columns of a row in table 2 . I want to return the type of "fruit" if the value is in that range. The method in between seems to

Pandas lookup value in a range in another table

Bath In the example below, I am trying to check if the "value" in table 1 is within the range of values for the "start" and "stop" columns of a row in table 2 . I want to return the type of "fruit" if the value is in that range. The method in between seems to

How to apply Pandas lookup table to numpy arrays?

Agerwood: I have a pandas series like this: measure 0 0.3 6 0.6 9 0.2 11 0.3 14 0.0 17 0.1 23 0.9 and a numpy array like this: array([[ 0, 0, 9, 11], [ 6, 14, 6, 17]]) How can I do a lookup from a value in a numpy array to an

Pandas DataFrame as lookup table with embedded lists

Neil: I have a dataframe with the following structure: A B [1, 2, 3] [a, b, c] [4, 5, 6] [d, e, f] I want to query the dataframe so it 1returns as it is entered [a,b,c]. Again, the query 6should return [d, e, f]. What is the most readabl

Pandas lookup value in a range in another table

Bath In the example below, I am trying to check if the "value" in table 1 is within the range of values for the "start" and "stop" columns of a row in table 2 . I want to return the type of "fruit" if the value is in that range. The method in between seems to

Pandas lookup value in a range in another table

Bath In the example below, I am trying to check if the "value" in table 1 is within the range of values for the "start" and "stop" columns of a row in table 2 . I want to return the type of "fruit" if the value is in that range. The method in between seems to

Pandas - fillna with row subset

Left__ I'm trying to fill certain rows with 0's where certain conditions apply. I'm trying now: df.loc[:,(df.Available == True) & (df.Intensity.isnull())].Intensity = df.loc[(df.Available == True) & (df.Intensity.isnull())].Intensity.fillna(0, inplace=True) T

Pandas fillna() not working

Ryan I am trying to replace NaN values in a dataframe with the mean in the same row. sample_df = pd.DataFrame({'A':[1.0,np.nan,5.0], 'B':[1.0,4.0,5.0], 'C':[1.0,1.0,4.0], 'D':[6.0,5.0,5.0],

Pandas fillna empty dictionary

Sander I have a pandas dataframe with a column "metadata" which should contain a dictionary as values. However, some values are missing and set to NaN. I want to change to {}. Sometimes the whole column is lost and initializing it to {} is also problematic. fo

Conditional fillna() in pandas dataframe

RSM I have two dataframes below df1anddf2 df1: A B C D 1 Nora NaN Japan 2 Neo NaN India 3 Nord NaN Fuji 4 Noman 2020 Unknown df2: E F 1123 Neo 1124 Norm 1126 Nora I need to do a fillna once df1a

Is Pandas fillna wise?

Adaf Here is a simple example. d=pd.DataFrame({'x':[1,None,None,3,4],'y':[3,2,3,None,7],'z':[None,None,None,None,None]}) d['t']=d.mean(axis=1) Out[96]: x y z t 0 1.0 3.0 None 2.0 1 NaN 2.0 None 2.0 2 NaN 3.0 None 3.0 3 3.0 NaN N

Pandas fillna value increase

Eric M I have a dataframe with a column of consecutive but not adjacent numbers and missing values. I want to use the fillnafunction to fill missing values using the incremental value of the previous non-missing row. Here is a simplified table: index my_count

Pandas - fillna with row subset

Left__ I'm trying to fill certain rows with 0's where certain conditions apply. I'm trying now: df.loc[:,(df.Available == True) & (df.Intensity.isnull())].Intensity = df.loc[(df.Available == True) & (df.Intensity.isnull())].Intensity.fillna(0, inplace=True) T

Pandas fillna using groupby

niche I'm trying to estimate values using rows with similar column values. For example, I have this dataframe one | two | three 1 1 10 1 1 nan 1 1 nan 1 2 nan 1 2 20 1 2 nan 1 3 nan 1 3 na

Is Pandas fillna wise?

Adaf Here is a simple example. d=pd.DataFrame({'x':[1,None,None,3,4],'y':[3,2,3,None,7],'z':[None,None,None,None,None]}) d['t']=d.mean(axis=1) Out[96]: x y z t 0 1.0 3.0 None 2.0 1 NaN 2.0 None 2.0 2 NaN 3.0 None 3.0 3 3.0 NaN N

Pandas - fillna with row subset

Left__ I'm trying to fill certain rows with 0's where certain conditions apply. I'm trying now: df.loc[:,(df.Available == True) & (df.Intensity.isnull())].Intensity = df.loc[(df.Available == True) & (df.Intensity.isnull())].Intensity.fillna(0, inplace=True) T

Conditional fillna() in pandas dataframe

RSM I have two dataframes below df1anddf2 df1: A B C D 1 Nora NaN Japan 2 Neo NaN India 3 Nord NaN Fuji 4 Noman 2020 Unknown df2: E F 1123 Neo 1124 Norm 1126 Nora I need to do a fillna once df1a

Pandas fillna() not working

Ryan I am trying to replace NaN values in a dataframe with the mean in the same row. sample_df = pd.DataFrame({'A':[1.0,np.nan,5.0], 'B':[1.0,4.0,5.0], 'C':[1.0,1.0,4.0], 'D':[6.0,5.0,5.0],