Is there any faster way than pandas fillna()?

wanderer

Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe.

Is there a faster way than this?

(I know it would help if only some rows and/or columns containing NA were removed)

Jesler

I try to test:

np.random.seed(123)
N = 60000
df = pd.DataFrame(np.random.choice(['a', None], size=(N, 20), p=(.7, .3)))

In [333]: %timeit df.fillna('b')
93.5 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [337]: %timeit df[df.isna()] = 'b'
122 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Changed solution (but I find it a bit hacky):

#pandas below
In [335]: %timeit df.values[df.isna()] = 'b'
56.7 ms ± 799 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

#pandas 0.24+
In [339]: %timeit df.to_numpy()[df.isna()] = 'b'
56.5 ms ± 951 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Related

Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Is there any faster way than pandas fillna()?

wanderer Pandas fillna()Very slow, especially if a lot of data is missing in the dataframe. Is there a faster way than this? (I know it would help if only some rows and/or columns containing NA were removed) Jesler I try to test: np.random.seed(123) N = 60000

Is there any faster way than .Any() to check for this situation?

Antoine Pelletier I've been looking for a way to do the following with a faster EF query: using (DAL.MandatsDatas db = new DAL.MandatsDatas()) { if(db.ARTICLE.Any( t => t.condition == condition)) oneArticle = db.ARTICLE.First( t => t.condition == co

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Pandas: Faster way than rollforward?

measure everything I am preparing some data for cohort analysis. The information I have is similar to a fake dataset that can be generated with the following code: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # pre

Is there a faster way than a for loop to change a Pandas group

Matt I have the following dataframe which I am using: These are the chess games I'm trying to group by game and then perform a function in each game based on the number of moves taken in that game... game_id move_number colour avg_centi 0 03

Is there a faster way than a for loop to change a Pandas group

Matt I have the following dataframe which I am using: These are the chess games I'm trying to group by game and then perform a function in each game based on the number of moves taken in that game... game_id move_number colour avg_centi 0 03

Is there a faster way than .Any() to find if IEnumerable<T> has any data?

Lady_A When running the profiler on my code, I'm showing a total execution time of 20 seconds, and calling IEnumerable on this takes 14 seconds (14,788.4 ms).Any() is there a way to speed it up completely? The table pulled from it has a total of 484,000 record

Is there a faster way than .Any() to find if IEnumerable<T> has any data?

Lady_A When running the profiler on my code, I'm showing a total execution time of 20 seconds, and calling IEnumerable on this takes 14 seconds (14,788.4 ms).Any() is there a way to speed it up completely? The table pulled from it has a total of 484,000 record

Way to make pandas multi-index dataframe faster than append

despite this I'm looking for a faster way to load data from a json object into a multiindex dataframe. My JSON is like: { "1990-1991": { "Cleveland": { "salary": "$14,403,000", "players": {

Way to make pandas multi-index dataframe faster than append

despite this I'm looking for a faster way to load data from a json object into a multiindex dataframe. My JSON is like: { "1990-1991": { "Cleveland": { "salary": "$14,403,000", "players": {

Way to make pandas multi-index dataframe faster than append

despite this I'm looking for a faster way to load data from a json object into a multiindex dataframe. My JSON is like: { "1990-1991": { "Cleveland": { "salary": "$14,403,000", "players": {

Way to make pandas multi-index dataframe faster than append

despite this I'm looking for a faster way to load data from a json object into a multiindex dataframe. My JSON is like: { "1990-1991": { "Cleveland": { "salary": "$14,403,000", "players": {

Way to make pandas multi-index dataframe faster than append

despite this I'm looking for a faster way to load data from a json object into a multiindex dataframe. My JSON is like: { "1990-1991": { "Cleveland": { "salary": "$14,403,000", "players": {

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Is there any faster way to process pandas dataframe into large csv?

Work I want to make the following code faster to export into a (average file size 800MB) csv with 100+ columns. ...................................................... .................,................................ ..........................................