Is there any faster way than pandas fillna()?


wanderer

Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe.

Is there a faster way than this?

(I know it would help if only some rows and/or columns containing NA were removed)

Jesler

I try to test:

np.random.seed(123)
N = 60000
df = pd.DataFrame(np.random.choice(['a', None], size=(N, 20), p=(.7, .3)))

In [333]: %timeit df.fillna('b')
93.5 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [337]: %timeit df[df.isna()] = 'b'
122 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Changed solution (but I find it a bit hacky):

#pandas below
In [335]: %timeit df.values[df.isna()] = 'b'
56.7 ms ± 799 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

#pandas 0.24+
In [339]: %timeit df.to_numpy()[df.isna()] = 'b'
56.5 ms ± 951 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Related


Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n