How to remove blocks of text with different lengths from different texts in a character vector?


Rollo99

I have a character vector with 231 documents (231 rows by one column). There is a large amount of text at the beginning of each document that I would like to remove from 231 documents. The problem is that the length of this block varies from document to document.

Let's take an example where each text starts with: the text I want to remove:

I have tried the following options with no results:

x <- c("Text that I wish to remove because I don't like it. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.", 
  "Text that I wish to remove. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.", 
  "Text that I wish to remove and I will remove it because some great data analyst will help me solve it. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.", 
  "Text that I wish to remove and who know whether I manage to make it work, it could be and it could not be. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.")

If the blocks to be removed are equal, then I would simply do the following as someone suggested in a previous post:

strings <- substring(x, 60)

However, I'm now stuck due to any text being of different lengths.

Ideally, I would like to get:

[1] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[2] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[3] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[4] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."

who can help me?

thank you very much!

Christian Gangan

You can use the following code

  gsub("^.+\\. ", "", x)

[1] "I hope that stackoverflow will sort it out."
[2] "I hope that stackoverflow will sort it out."
[3] "I hope that stackoverflow will sort it out."
[4] "I hope that stackoverflow will sort it out."

Related


Extract numbers from cells with different character lengths

administrator I have a set of cells, the first of the string never changes, it is and will always (until the encoder changes it) 20characters (including spaces). Then, I want to extract 3 numbers (2 in some cases) from the remaining sequence. The monthly cost

How to make bwplot of different vector lengths

Garf Hi, I'm trying to make a bwplot (must be a bwplot from Lattice) to represent the lengths of different vectors. xx1 <- rnorm(20, mean = 3, sd = 3.6) #20 xx2 <- rpois(40, lambda = 3.5) xx3 <- rchisq(31, df = 5, ncp = 0) #31 Mark in the box This will be eas

How to align outputs of different lengths from an array?

User 2554341 I'm making a drink machine program and I want all prices to be uniform. Is there any way to tell the compiler how many spaces to write the price X from the beginning of the line, instead of adding spaces after the drink name is posted? int main()

How to align outputs of different lengths from an array?

User 2554341 I'm making a drink machine program and I want all prices to be uniform. Is there any way to tell the compiler how many spaces to write the price X from the beginning of the line, instead of adding spaces after the drink name is posted? int main()