Pandas idiomatic way to customize fillna


Kyiv

I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input:

s = pd.Series([0, 0, np.nan, np.nan, 75, np.nan, np.nan, np.nan, np.nan, 50],
              pd.date_range(start="Jan 1 2016", end="Jan 10 2016", freq='D'))

2016-01-01      0.0
2016-01-02      0.0
2016-01-03      NaN
2016-01-04      NaN
2016-01-05     75.0
2016-01-06      NaN
2016-01-07      NaN
2016-01-08      NaN
2016-01-09      NaN
2016-01-10     50.0

becomes the following output:

2016-01-01     0.0
2016-01-02     0.0
2016-01-03    25.0
2016-01-04    25.0
2016-01-05    25.0
2016-01-06    10.0
2016-01-07    10.0
2016-01-08    10.0
2016-01-09    10.0
2016-01-10    10.0

Is there an idiomatic Pandas way to do this instead of just doing a for loop over the data? I've tried various things involving fillna, dropna, isnull, doing shiftcheck-down values, etc, but I can't see how to piece it together.

wisdom

For each block with missing values, this might work, create a group variable using cumsum(from the end of the series) , then perform a grouped average operation on each block:

s.groupby(s.notnull()[::-1].cumsum()[::-1]).transform(lambda g: g[-1]/g.size)

#2016-01-01     0.0
#2016-01-02     0.0
#2016-01-03    25.0
#2016-01-04    25.0
#2016-01-05    25.0
#2016-01-06    10.0
#2016-01-07    10.0
#2016-01-08    10.0
#2016-01-09    10.0
#2016-01-10    10.0
#Freq: D, dtype: float64

or another option:

s.groupby(s.shift().notnull().cumsum()).transform(lambda g: g[-1]/g.size)

#2016-01-01     0.0
#2016-01-02     0.0
#2016-01-03    25.0
#2016-01-04    25.0
#2016-01-05    25.0
#2016-01-06    10.0
#2016-01-07    10.0
#2016-01-08    10.0
#2016-01-09    10.0
#2016-01-10    10.0
#Freq: D, dtype: float64

Related


Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Pandas idiomatic way to look up a dictionary

pythonic metaphor I have a series of pandas integers (they are limited to some small finite subset) and a dictionary that doubles these possible integers. I want to create a new series that looks like dictionary[series]. What is the pandas idiomatic way? Alex

Pandas idiomatic way to look up a dictionary

pythonic metaphor I have a series of pandas integers (they are limited to some small finite subset) and a dictionary that doubles these possible integers. I want to create a new series that looks like dictionary[series]. What is the pandas idiomatic way? Alex

Another way to select multiple columns and fillna() using pandas

Xavier I'm trying to use pandas to select three columns ["attacktype1", "attacktype2", "attacktype3"] whose datatypes are integers from a dataframe and would like to merge the fillna(0) from those columns into a new column. ["Total Attack"] Datasets can be dow

Another way to select multiple columns and fillna() using pandas

Xavier I'm trying to use pandas to select three columns ["attacktype1", "attacktype2", "attacktype3"] whose datatypes are integers from a dataframe and would like to merge the fillna(0) from those columns into a new column. ["Total Attack"] Datasets can be dow

Pandas idiomatic way to do this custom copy row function?

do jones I have a pd.DataFrame where each row represents a group of people. They have an id (I have several columns in my dataframe, but here is summarized by the columns of "id"my example dataframe ). Each of this group represents several people (columns "siz

Pandas idiomatic way to do this custom copy row function?

do jones I have a pd.DataFrame where each row represents a group of people. They have an id (I have several columns in my dataframe, but here is summarized by the columns of "id"my example dataframe ). Each of this group represents several people (columns "siz

Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with several NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pcla

Pandas - fillna with row subset

Left__ I'm trying to fill certain rows with 0's where certain conditions apply. I'm trying now: df.loc[:,(df.Available == True) & (df.Intensity.isnull())].Intensity = df.loc[(df.Available == True) & (df.Intensity.isnull())].Intensity.fillna(0, inplace=True) T

Pandas fillna() not working

Ryan I am trying to replace NaN values in a dataframe with the mean in the same row. sample_df = pd.DataFrame({'A':[1.0,np.nan,5.0], 'B':[1.0,4.0,5.0], 'C':[1.0,1.0,4.0], 'D':[6.0,5.0,5.0],

Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with several NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pcla

Pandas fillna empty dictionary

Sander I have a pandas dataframe with a column "metadata" which should contain a dictionary as values. However, some values are missing and set to NaN. I want to change to {}. Sometimes the whole column is lost and initializing it to {} is also problematic. fo

Conditional fillna() in pandas dataframe

RSM I have two dataframes below df1anddf2 df1: A B C D 1 Nora NaN Japan 2 Neo NaN India 3 Nord NaN Fuji 4 Noman 2020 Unknown df2: E F 1123 Neo 1124 Norm 1126 Nora I need to do a fillna once df1a

Is Pandas fillna wise?

Adaf Here is a simple example. d=pd.DataFrame({'x':[1,None,None,3,4],'y':[3,2,3,None,7],'z':[None,None,None,None,None]}) d['t']=d.mean(axis=1) Out[96]: x y z t 0 1.0 3.0 None 2.0 1 NaN 2.0 None 2.0 2 NaN 3.0 None 3.0 3 3.0 NaN N

Pandas fillna value increase

Eric M I have a dataframe with a column of consecutive but not adjacent numbers and missing values. I want to use the fillnafunction to fill missing values using the incremental value of the previous non-missing row. Here is a simplified table: index my_count

Pandas fillna with lookup table

Zambi Having some trouble filling in NaNs. I want to have a dataframe column with a few NaNs and populate them with values derived from a "lookup table" based on the values in another column. (You might recognize my data from the Titanic dataset)... Pclass

Pandas - fillna with row subset

Left__ I'm trying to fill certain rows with 0's where certain conditions apply. I'm trying now: df.loc[:,(df.Available == True) & (df.Intensity.isnull())].Intensity = df.loc[(df.Available == True) & (df.Intensity.isnull())].Intensity.fillna(0, inplace=True) T

Pandas fillna using groupby

niche I'm trying to estimate values using rows with similar column values. For example, I have this dataframe one | two | three 1 1 10 1 1 nan 1 1 nan 1 2 nan 1 2 20 1 2 nan 1 3 nan 1 3 na

Is Pandas fillna wise?

Adaf Here is a simple example. d=pd.DataFrame({'x':[1,None,None,3,4],'y':[3,2,3,None,7],'z':[None,None,None,None,None]}) d['t']=d.mean(axis=1) Out[96]: x y z t 0 1.0 3.0 None 2.0 1 NaN 2.0 None 2.0 2 NaN 3.0 None 3.0 3 3.0 NaN N

Pandas - fillna with row subset

Left__ I'm trying to fill certain rows with 0's where certain conditions apply. I'm trying now: df.loc[:,(df.Available == True) & (df.Intensity.isnull())].Intensity = df.loc[(df.Available == True) & (df.Intensity.isnull())].Intensity.fillna(0, inplace=True) T

Conditional fillna() in pandas dataframe

RSM I have two dataframes below df1anddf2 df1: A B C D 1 Nora NaN Japan 2 Neo NaN India 3 Nord NaN Fuji 4 Noman 2020 Unknown df2: E F 1123 Neo 1124 Norm 1126 Nora I need to do a fillna once df1a