Replacing column values by aggregating on groups


mechanical

Suppose I have the following dataframe df:

   id   x    y        timestamp
   1   32   30        1031
   1   4    105       1035
   1   8    110       1050
   2   18   10        1500
   2   40   20        1550
   2   80   10        1450
....


import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1,1,1,2,2,2], [32,4,8,18,40,80], [30,105,110,10,20,10], [1031,1035,1050,1500,1550,1450]])).T
df.columns = ['id', 'x', 'y', 'timestamp']

I now have the following code:

df= df.groupby(["id"]).agg({
    'timestamp': lambda x: x.max() - x.min(),
    'x': 'mean',
    'y': 'mean'
}).reset_index()

Unfortunately, this is not what I want. I want the following resulting dataframe:

id    x     y       timestamp
-----------------------------
1    32     30      19
1    4      105     19
1    8      110     19
2    10     10      100
2    40     20      100
2    80     10      100
....

This means that the timestamp column should be replaced with the max-min value for each group (but I don't want to aggregate the whole group into a single value).

How can this be done?

Rafaelke

IIUC, you just need to use numpy'stransformpeak-to-peak

df['timestamp'] = df.groupby(["id"]).timestamp.transform(np.ptp)

Related


Replacing column values with a dictionary

Santano I have this dataframe where gender is expected to be male or female. from io import StringIO import pandas as pd audit_trail = StringIO(''' course_id AcademicYear_to months TotalFee Gender 260 2017 24 100 male 260 2018 12 140 male 274 2016 36 300 mail

Replacing column values with Powershell

Paul I'm trying to cycle through the CSV, replacing any values Enabledfrom Trueto in a column named A. Import-Csv .\test.csv | Where-Object {$_.Enabled -eq 'True'} --> what goes here to replace 'True' with 'A'? British Where-ObjectLike a filter so that the re

Replacing values in a Pandas column

ctrl-alt-delete Is it possible to replace a value in pandas with the DataFramevalue N/A based on logic that determines if a value is above or below a certain threshold? import pandas as pd df = pd.DataFrame({'date': pd.date_range(start='2015-12-31',

Replacing column values in pandas

Su Bohan I have a "height" column in my dataset as shown below. Height 0 6-2 1 6-6 2 6-5 3 6-5 4 6-10 5 6-9 6 6-8 7 7-0 Its type is dtype: objectnow i want to convert it to float i.e 6.2, 6.6 i tried to rep

Replacing column values with a dictionary

Santano I have this dataframe where gender is expected to be male or female. from io import StringIO import pandas as pd audit_trail = StringIO(''' course_id AcademicYear_to months TotalFee Gender 260 2017 24 100 male 260 2018 12 140 male 274 2016 36 300 mail

Replacing values in a column with Oracle

NuValue How to change all values of a single column to other values in a single order? For example, I want to change the old value of salary (2250,1,3500,1) to the new one (2352,7512,4253,1142) in the last column. I have this database: I know how to do it, but

Replacing column values with a dictionary

Santano I have this dataframe where gender is expected to be male or female. from io import StringIO import pandas as pd audit_trail = StringIO(''' course_id AcademicYear_to months TotalFee Gender 260 2017 24 100 male 260 2018 12 140 male 274 2016 36 300 mail

Replacing column values with a dictionary

Santano I have this dataframe where gender is expected to be male or female. from io import StringIO import pandas as pd audit_trail = StringIO(''' course_id AcademicYear_to months TotalFee Gender 260 2017 24 100 male 260 2018 12 140 male 274 2016 36 300 mail

Replacing column values in pandas

Su Bohan I have a "height" column in my dataset as shown below. Height 0 6-2 1 6-6 2 6-5 3 6-5 4 6-10 5 6-9 6 6-8 7 7-0 Its type is dtype: objectnow i want to convert it to float i.e 6.2, 6.6 i tried to rep

Replacing column values in pandas

Su Bohan I have a "height" column in my dataset as shown below. Height 0 6-2 1 6-6 2 6-5 3 6-5 4 6-10 5 6-9 6 6-8 7 7-0 Its type is dtype: objectnow i want to convert it to float i.e 6.2, 6.6 i tried to rep

Replacing column values in pandas DataFrame

black: I am trying to replace values in one column of a dataframe. The column ("female") contains only the values "female" and "male". I have tried the following: w['female']['female']='1' w['female']['male']='0' But will receive an exact copy of the previou

Replacing pandas column values with array

Adam Schroeder I have an array: ([ 137.55021238, 125.30017675, 130.20181675, 109.47348838]) I need to replace column b with array values, while the index numbers remain the same: Index a b 0 0.671399 Nan 35 0.446172 Nan 63

Replacing column values in pandas DataFrame

black: I am trying to replace values in one column of a dataframe. The column ("female") contains only the values "female" and "male". I have tried the following: w['female']['female']='1' w['female']['male']='0' But will receive an exact copy of the previou

Replacing invalid values in a column with pandas

Jessica My sample dataset: import pandas as pd import numpy as np df = {'ID': ['A',0,0,1,'A',1], 'ID1':['Yes','Yes','No','No','Yes','Yes']} df = pd.DataFrame(df) My real dataset is read from excel file and column "ID1" contains "Yes" or "No". Column "ID" cont

Replacing column values in pandas dataframe

Short message I have a large number of data input file (22000) columns and when I use df = pd.read_csv(path_to_file)it it uses the first row of numbers as the column value. Is it possible to replace column values with random variables or load data in a way tha

Replacing column values in pandas dataframe

Short message I have a large number of data input file (22000) columns and when I use df = pd.read_csv(path_to_file)it it uses the first row of numbers as the column value. Is it possible to replace column values with random variables or load data in a way tha

Replacing values in a column with a named list

Luke Steele Suppose a column in my dataframe refers to the names of cities. The city names are represented as "longformA", "longformB" and I want to replace them all with "shrtfrmA", "shrtfrmB". Each "long form" name has an associated "shrtfrm" name, which sho

Replacing values in a column with a logical vector

Becca I have a dataframe in long format. The OTU column has ~428 unique IDs, and repeated measures yields 26,536 rows. 'data.frame': 26536 obs. of 18 variables: $ OTU : chr "109431" "109431" "109431" "109431" ... $ Sample : chr "m.ch.45" "m.c

Replacing pandas column values with array

Adam Schroeder I have an array: ([ 137.55021238, 125.30017675, 130.20181675, 109.47348838]) I need to replace column b with array values, while the index numbers remain the same: Index a b 0 0.671399 Nan 35 0.446172 Nan 63

Replacing invalid values in a column with pandas

Jessica My sample dataset: import pandas as pd import numpy as np df = {'ID': ['A',0,0,1,'A',1], 'ID1':['Yes','Yes','No','No','Yes','Yes']} df = pd.DataFrame(df) My real dataset is read from excel file and column "ID1" contains "Yes" or "No". Column "ID" cont

Replacing cell values in a column with a function

username I have a fairly large Dataframes 22000X29. I want to clean a specific column for data aggregation. A column value can replace many cells. I want to write a function to use the replace function to accomplish this task. How to pass column name to functi

Replacing column values in pandas dataframe

Short message I have a large number of data input file (22000) columns and when I use df = pd.read_csv(path_to_file)it it uses the first row of numbers as the column value. Is it possible to replace column values with random variables or load data in a way tha

Replacing column values with POSIXlt objects

Daniel Consider the following R code that replaces values in one column of a data frame with a set of POSIXct values: foo <- as.data.frame(list(bar=rep(5,5))) bar <- as.POSIXct(rep(5,5), origin="1970-1-1", tz="c") foo[,1] <- bar My question: why does the same

Replacing range of column values in R

Claire I have a column in a dataset populated with dates and I want to replace them with week values. Is there a way to set a specific date range (eg Jan 1, 2016 to Jan 7, 2016) and every time any date in that range occurs, replace it with another value (eg, w