Remove string elements in python dataframe based on element length


A generation

I have a python dataframe consisting of 13 columns and 60000 rows, one of these columns is named "Text" (type object), which contains fairly long cells of text:

    Text    ID  AI  BI  GH  JB  EQ  HE  EN  MA  WE  WR
2585    obstetric gynaecologicaladmissions owing abor...    2585    0   0   0   0   0   1   0   0   0   0
507     graphic illustration process flow help organiz...   507     0   0   0   0   0   0   0   0   1   0

To avoid this, I would like to remove all these cases in the whole dataset to insert some words in some rows (like in the first dataframe row: gynaecologicadmissions). I thought about removing, for each line in the "text" column, all words longer than 13 characters

I have tried the following code:

res.loc[res['Text'].str.len() < 13]

But it only gives two blank lines.

How can I solve this problem?

gagulaf

Let's look at an example data frame

df

    text
0   obstetric gynaecologicaladmissions owing
1   graphic illustration process flow help
2   process flow help
3   illustrationprocess flow

Since you have to check the length of the words, you have to split each string by a delimiter (space in this case), then loop through the array and include words with length less than or equal to 13. To iterate through each array, you can useapply

def func(x):
    res = list()
    for word in x:
        if len(word) <= 13:
            res.append(word)
    return " ".join(res)
    
df['text'] = df['text'].str.split().apply(func)
df
    
     text
0   obstetric owing
1   graphic illustration process flow help
2   process flow help
3   flow

Related


Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I'm trying to do is remove all values that contain the word

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I'm trying to do is remove all values that contain the word

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I'm trying to do is remove all values that contain the word

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I'm trying to do is remove all values that contain the word

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I'm trying to do is remove all values that contain the word

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I'm trying to do is remove all values that contain the word

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I want to do is remove all values that contain the word "no

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I want to do is remove all values that contain the word "no

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I want to do is remove all values that contain the word "no

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I want to do is remove all values that contain the word "no

Remove elements based on element content?

bad thing df.cleaned <- df[-which(str_detect(df, "Not found")),] "df" refers to a data frame consisting of multiple columns and rows. Many elements in this data frame have certain words in them. What I want to do is remove all values that contain the word "no

Python Pandas DataFrame change content based on string length

Sandy Takula I have a dataframe like below. df = pd.DataFrame([111111,123456,12345,234,12,987654],columns=['id']) So if you look at lines 3, 4, 5, the length of the ID is less than 6 id 111111 123456 12345 234 12 987654 I would like to convert this to the fo

Python Pandas DataFrame change content based on string length

Sandy Takula I have a dataframe like below. df = pd.DataFrame([111111,123456,12345,234,12,987654],columns=['id']) So if you look at lines 3, 4, 5, the length of the ID is less than 6 id 111111 123456 12345 234 12 987654 I would like to convert it to the foll

Remove elements with specific string length from a vector

Jigal Patel I have a vector with random strings as elements. I'm trying to loop through to find strings that are not of length 2. If the length is not 2, it should be removed from the vector. For some reason my code is not removing all strings that are not of

Remove elements with specific string length from a vector

Jigal Patel I have a vector with random strings as elements. I'm trying to loop through to find strings that are not of length 2. If the length is not 2, it should be removed from the vector. For some reason my code is not removing all strings that are not of