Pandas idiomatic way to do this custom copy row function?


do jones

I have a pd.DataFrame where each row represents a group of people. They have an id (I have several columns in my dataframe, but here is summarized by the columns of "id"my example dataframe ). Each of this group represents several people (columns "size").

I am trying to divide these groups into smaller groups of maximum size "max_size". For example, if max_size = 5the AND line "id" = "foo"and "size" = 13should consist of three lines, all are replaced "id" = "foo"with the respective "size" = 5, "size" = 5and"size" = 3

I've written a function that works, but I'm looking for a more pandas idiomatic way to do this, if it exists.

My function is

def custom_duplicating_function(df):
    def aux_custom_duplicating_function(row, max_size=5):
        row = row.to_dict()
        size = row["size"]
        L = [row.copy() for i in range((size // max_size + 1))]
        for i in range(len(L) - 1):
            L[i]["size"] = max_size 
        L[-1]["size"] = size%max_size
        return(pd.DataFrame.from_dict(L))

    temp = df.apply(aux_custom_duplicating_function, axis=1)
    result = pd.concat([temp[i] for i in range(len(temp.index))])
    return(result)

The following dataframe

test = pd.DataFrame.from_dict([{"id":"foo", "size":13},
                     {"id":"bar", "size":17},
                     {"id":"baz", "size":3}])
************
    id  size
0  foo    13
1  bar    17
2  baz     3
************

should be converted to

    id  size
0  foo     5
1  foo     5
2  foo     3
0  bar     5
1  bar     5
2  bar     5
3  bar     2
0  baz     3

Mark King

Use explodefor pandas >= 0.25

test['size'] = test['size'].apply(lambda x:[5]*(x//5)+[(x%5)])

test.explode('size')

Related


Pandas idiomatic way to do this custom copy row function?

do jones I have a pd.DataFrame where each row represents a group of people. They have an id (I have several columns in my dataframe, but here is summarized by the columns of "id"my example dataframe ). Each of this group represents several people (columns "siz

Is there a way to custom clean a row in Pandas?

Jerome I'm new to Pandas and I'm trying to use it to clean a database consisting of index, artwork title and artwork dimension. What I have is: db1 = {'title' : ['121 art1 magic world 100x82 2000.jpg', '383 art2 fantastic comic 61x61 2017.jpg']} What I need i

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas idiomatic way to customize fillna

Kyiv I have time series data in the following format, where one value represents the cumulative amount since the last recording. What I want to do is to "scatter" the accumulators that contain NaNs in the past so that this input: s = pd.Series([0, 0, np.nan, n

Pandas – per row "multiplication table" with custom function

lkky7 I have a DataFrame with city coordinates like this (example): x y A 10 20 B 20 30 C 15 60 I want to calculate the distance between them: sqrt(x^2 + y^2) multiplication table with each other (example): A B C A 0 20 30 B 20 0 25 C 30 25 0 How can

Pandas transform function for custom row manipulation

Priyank We want to create a column in the data frame called feature col which is the range of the current value and the previous two values, as shown in the image, the difference between the maximum and minimum values. How do we calculate this in pandas? There

Pandas – per row "multiplication table" with custom function

lkky7 I have a DataFrame with city coordinates like this (example): x y A 10 20 B 20 30 C 15 60 I want to calculate the distance between them: sqrt(x^2 + y^2) multiplication table with each other (example): A B C A 0 20 30 B 20 0 25 C 30 25 0 How can

Pandas transform function for custom row manipulation

Priyank We want to create a column in the data frame called feature col which is the range of the current value and the previous two values, as shown in the image, the difference between the maximum and minimum values. How do we calculate this in pandas? There

Pandas – per row "multiplication table" with custom function

lkky7 I have a DataFrame with city coordinates like this (example): x y A 10 20 B 20 30 C 15 60 I want to calculate the distance between them: sqrt(x^2 + y^2) multiplication table with each other (example): A B C A 0 20 30 B 20 0 25 C 30 25 0 How can

Pandas transform function for custom row manipulation

Priyank We want to create a column in the data frame called feature col which is the range of the current value and the previous two values, as shown in the image, the difference between the maximum and minimum values. How do we calculate this in pandas? There

Pandas – per row "multiplication table" with custom function

lkky7 I have a DataFrame with city coordinates like this (example): x y A 10 20 B 20 30 C 15 60 I want to calculate the distance between them: sqrt(x^2 + y^2) multiplication table with each other (example): A B C A 0 20 30 B 20 0 25 C 30 25 0 How can

Pandas – per row "multiplication table" with custom function

lkky7 I have a DataFrame with city coordinates like this (example): x y A 10 20 B 20 30 C 15 60 I want to calculate the distance between them: sqrt(x^2 + y^2) multiplication table with each other (example): A B C A 0 20 30 B 20 0 25 C 30 25 0 How can

Pandas transform function for custom row manipulation

Priyank We want to create a column in the data frame called feature col which is the range of the current value and the previous two values, as shown in the image, the difference between the maximum and minimum values. How do we calculate this in pandas? There

Pandas transform function for custom row manipulation

Priyank We want to create a column in the data frame called feature col which is the range of the current value and the previous two values, as shown in the image, the difference between the maximum and minimum values. How do we calculate this in pandas? There

Pandas transform function for custom row manipulation

Priyank We want to create a column in the data frame called feature col which is the range of the current value and the previous two values, as shown in the image, the difference between the maximum and minimum values. How do we calculate this in pandas? There

Pandas idiomatic way to look up a dictionary

pythonic metaphor I have a series of pandas integers (they are limited to some small finite subset) and a dictionary that doubles these possible integers. I want to create a new series that looks like dictionary[series]. What is the pandas idiomatic way? Alex

Pandas idiomatic way to look up a dictionary

pythonic metaphor I have a series of pandas integers (they are limited to some small finite subset) and a dictionary that doubles these possible integers. I want to create a new series that looks like dictionary[series]. What is the pandas idiomatic way? Alex

Idiomatic way of accepting unordered function arguments

Chris Allen Lane: I am learning Go. In JavaScript, it's easy to define a function that accepts multiple unordered arguments by encapsulating the arguments in an object: // define our function var foo = function(params) { // ... don't care }; // specify para

Idiomatic way of accepting unordered function arguments

Chris Allen Lane: I am learning Go. In JavaScript, it's easy to define a function that accepts multiple unordered arguments by encapsulating the arguments in an object: // define our function var foo = function(params) { // ... don't care }; // specify para