Compare and Move Values in Pandas DataFrame Columns - Python
dirty
I populate the following pandas DataFrame:
The ID contained in the first cell of the column matches the IDs in the three cells of the second column. The content of each cell is not fixed (so they are not literal string values, but data taken from variable json api output).
How can I compare the contents of the two columns (and since the contents are fixed, I guess this has to be done variably, not using literal strings), and if there is a match, move the match to the corresponding cell in it next to it? Hope this makes sense, this is the type of output I'm looking for:
Chris
# sample data
df = pd.DataFrame(np.array([{'data': [{'id': '12345', 'type': 'education'}, {'id': '23456', 'type': 'education'}, {'id': '34567', 'type': 'education'}]},
{'data': [{'id': '45678', 'type': 'education'}, {'id': '56789', 'type': 'education'}]},
{'data': [{'id': '78999', 'type': 'education'}]}]), columns=['Edu ID'])
# create a new frame but orient the index and explode
df_e = pd.DataFrame.from_dict(df['Edu ID'].to_dict(), orient='index')['data'].explode()
# take the new frame and convert it to a list then groupby the index and create a list of ids
final_df = df.join(pd.DataFrame(df_e.tolist(), index=df_e.index).groupby(level=0)['id'].agg(list))
Edu ID id
0 {'data': [{'id': '12345', 'type': 'education'}... [12345, 23456, 34567]
1 {'data': [{'id': '45678', 'type': 'education'}... [45678, 56789]
2 {'data': [{'id': '78999', 'type': 'education'}]} [78999]
If you need to filter == Education Type
# create a new frame but orient the index and explode
df_e = pd.DataFrame.from_dict(df['Edu ID'].to_dict(), orient='index')['data'].explode()
# take the new frame and convert it to a list and create a new frame
df_edu = pd.DataFrame(df_e.tolist(), index=df_e.index)
# use join but filter type to equal education and then gorupby and convert ids to a list
final_df = df.join(df_edu[df_edu['type'] == 'education'].groupby(level=0)['id'].agg(list))