How can I get rid of the embedded NUL on the original vector?


Andres Mora

I am scraping an ASP.NET website.

This returns a raw element ( reporte_nacido) which is a csv file (tabs as delimiters):

reporte_nacido = postForm('https://xxxxx/WebSiteNDE/BirthsPages/FiltrosExcelNac.aspx',
                               .params = params, 
                               curl = curl,
                               .opts = RCurl::curlOptions(ssl.verifypeer=FALSE, verbose=T))

If I load the file on a text viewer it looks like this

enter image description here

Now I am trying to load that primitive element in R and I am getting the following error. I believe the file downloaded from the server is corrupted somehow, R is picky about that

rawToChar(as.vector(unlist(reporte_nacido)))

Error in rawToChar(as.vector(unlist(reporte_nacido))) : 
  embedded nul in string: '\xfe\xff\0N\0\xda\0M\0E\0R\0O\0 \0C\0E\0R\0T\0I\0F\0I\0C\0A\0D\0O\0\t\0D\0E\0P\0A\0R\0T\0A\0M\0E\0N\0T\0O\0\t\0M\0U\0N\0I\0C\0I\0P\0I\0O\0\t\0A\0R\0E\0A\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0I\0N\0S\0P\0E\0C\0C\0I\0O\0N\0 \0C\0O\0R\0R\0E\0G\0I\0M\0I\0E\0N\0T\0O\0 \0O\0 \0C\0A\0S\0E\0R\0I\0O\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0S\0I\0T\0I\0O\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0C\0\xd3\0D\0I\0G\0O\0 \0I\0N\0S\0T\0I\0T\0U\0C\0I\0\xd3\0N\0\t\0N\0O\0M\0B\0R\0E\0 \0I\0N\0S\0T\0I\0T\0U\0C\0I\0\xd3\0N\0\t\0S\0E\0X\0O\0\t\0P\0E\0S\0O\0 \0(\0G\0r\0a\0m\0o\0s\0)\0\t\0T\0A\0L\0L\0A\0 \0(\0C\0e\0n\0t\0\xed\0m\0e\0t\0r\0o\0s\0)\0\t\0F\0E\0C\0H\0A\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0H\0O\0R\0A\0 \0N\0A\0C\0I\0M\0I\0E\0N\0T\0O\0\t\0P\0A\0R\0T\0O\0 \0A\0T\0E\0N\0D\0I\0D\0O\0 \0P\0O\0R\0\t\0T\0I\0E\0M\0P\0O\0 \0D\0E\0 \0G\0E\0S\0T\0A\0C\0I\0\xd3\0N\0\t\0N\0\xda\0M\0E\0R\0O\0 \0C\0O\0N\0S\0U\0L\0T\0A\0S\0 \0P\0R\0E\0N\0A\0T\0A\0L\0E\0S\0\t\0T\0I\0P\0O\0 \0P\0A
Alan Cameron

The original vector you get is text encoded as UTF-16. You can convert it like this:

library(stringi)

raw_vec <- as.vector(unlist(reporte_nacido))

decoded <- stri_encode(raw_vec, "UTF16")

decoded
#> [1] "NÚMERO CERTIFICADO\tDEPARTAMENTO\tMUNICIPIO\tAREA NACIMIENTO\tINSPECCION CORREGIMIENTO O CASERIO NACIMIENTO\tSITIO NACIMIENTO\tCÓDIGO INSTITUCIÓN\tNOMBRE INSTITUCIÓN\tSEXO\tPESO (Gramos)\tTALLA (Centímetros)\tFECHA NACIMIENTO\tHORA NACIMIENTO\tPARTO ATENDIDO POR\tTIEMPO DE GESTACIÓN\tNÚMERO CONSULTAS PRENATALES\tTIPO PA"

It seems to be in tab delimited rather than csv format, so you might want to read it like this:

read.table(text = decoded, sep = "\t", header = TRUE)

Related


How can I get rid of the jittery dialog?

Ammy Kang: I am new to this and I want to close the dialog when the task is done. I tried before: Navigator.pop(context, true); But my screen goes black and the dialog is still there. Here is my dialog code. Dialog _dialog = new Dialog( child: new Row(

How can I get rid of this jump on slideToggle()?

J82 When I click on the "details[+]" text, there is a slight jump. How can I get rid of it? HTML <div class="slide-caption"><strong>The Catwalk</strong><hr><em>Holmby Hills, California</em> <hr> <span class="details">Details [+]</span> <span class="details-dis

How can I get rid of the extra columns?

turtle I just made a list of customers, got the customer details from the database, added them to the list, and added that list to the DataGrid, but I get a lot of extra columns! Here is my DataGrid xaml code: <DataGrid x:Name="dataGridC" HorizontalAlignment=

How can I get rid of this jump on slideToggle()?

J82 When I click on the "details[+]" text, there is a slight jump. How can I get rid of it? HTML <div class="slide-caption"><strong>The Catwalk</strong><hr><em>Holmby Hills, California</em> <hr> <span class="details">Details [+]</span> <span class="details-dis

How can I get rid of the $.each loop?

username I have a jquery loop in which I cut a json array into 5 and now I need to check if the key value is empty. Once I check 5 elements, if my condition is OK, it will print 5 data, but what if it doesn't? It shows me the same time as the other time. Here

How can I get rid of this regex slowness?

system I have the following regex: (\d+\s+[-]\s+.*?(?=\s+-)|\d+\s+[-].*) The regular expression will use this text "Option 01 - Random phrase - Top Menu", "Option 02 - Another Random Phrase - Su Menu", "Option 03 - More 01 Phrase - Menu", "Option 04 - More Ph

How can I get rid of the subtuples in this list?

Malone list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)] Since both (0,2) and (4,6) are within the index of (0,6), I want to remove them. The resulting list will be: list_of_tuple = [(0,6), (6,7), (8,9)] It seems like I need to somehow sort the tuple of th

How can I get rid of the jittery dialog?

Ammy Kang: I am new to this and I want to close the dialog when the task is done. I tried before: Navigator.pop(context, true); But my screen goes black and the dialog is still there. Here is my dialog code. Dialog _dialog = new Dialog( child: new Row(

How can I get rid of the subtuples in this list?

Malone list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)] Since both (0,2) and (4,6) are within the index of (0,6), I want to remove them. The resulting list will be: list_of_tuple = [(0,6), (6,7), (8,9)] It seems like I need to somehow sort the tuple of th

How can I get rid of spaces in this String[]?

Learn Initially, I just wanted to get the characters in the string. I use split to isolate my characters to count them. I have my character but I can't get rid of the spaces it prints in the array. I've tried everything I've seen in other stack overflow posts.

How can I get rid of the ¦ symbol in the data

kar I desperately need your help. I scraped some data from wikipedia and came across this flag. At first I thought it was just but apparently not. Most of my cells look like this table$Population 7004164110000000000¦16,411[7] 7007111260000000000¦11,126,000[13]

How can I get rid of this regex slowness?

system I have the following regex: (\d+\s+[-]\s+.*?(?=\s+-)|\d+\s+[-].*) The regular expression will use this text "Option 01 - Random phrase - Top Menu", "Option 02 - Another Random Phrase - Su Menu", "Option 03 - More 01 Phrase - Menu", "Option 04 - More Ph

How can I get rid of the $.each loop?

username I have a jquery loop in which I cut a json array into 5 and now I need to check if the key value is empty. Once I check 5 elements, if my condition is OK, it will print 5 data, but what if it doesn't? It shows me the same time as the other time. Here

How can I get rid of the extra columns?

turtle I just made a list of customers, got the customer details from the database, added them to the list, and added that list to the DataGrid, but I get a lot of extra columns! Here is my DataGrid xaml code: <DataGrid x:Name="dataGridC" HorizontalAlignment=

How can I get rid of spaces in this String[]?

Learn Initially, I just wanted to get the characters in the string. I use split to isolate my characters to count them. I have my character but I can't get rid of the spaces it prints in the array. I've tried everything I've seen in other stack overflow posts.

How can I get rid of the $.each loop?

username I have a jquery loop in which I cut a json array into 5 and now I need to check if the key value is empty. Once I check 5 elements, if my condition is OK, it will print 5 data, but what if it doesn't? It shows me the same time as the other time. Here

How can I get rid of the subtuples in this list?

Malone list_of_tuple = [(0,2), (0,6), (4,6), (6,7), (8,9)] Since both (0,2) and (4,6) are within the index of (0,6), I want to remove them. The resulting list will be: list_of_tuple = [(0,6), (6,7), (8,9)] It seems like I need to somehow sort the tuple of th

How can I get rid of this jump on slideToggle()?

J82 When I click on the "details[+]" text, there is a slight jump. How can I get rid of it? HTML <div class="slide-caption"><strong>The Catwalk</strong><hr><em>Holmby Hills, California</em> <hr> <span class="details">Details [+]</span> <span class="details-dis

How can I get rid of this segmentation fault?

Mark Santos I am trying to create a function that extracts the extension from a filename. file_name points to a string containing the filename. The function should store the extension of the filename in the string pointed to by the extension. For example, if t

How can I get rid of this new line?

NSwanson7 I've created a program that asks the user to enter their name and then manipulates it in a number of ways. The last way to do this is to print the username backwards. For example, if the user enters John Doe, the program will print Doe John. The only

How can I get rid of this FormatException problem

Lee Jae-woong I wrote a code that converts a string value to a TimeSpan value. Sometimes it doesn't always have FormatException. It usually works well. To subtract the time from the string, I used DateTime.Parse for each string value. TimeSpan timespan = DateT

How can I get rid of the allocated space?

stronger OK, so I like the look of the drive: Well, almost. But I have two unallocatedparts to delete. What should I do? I clicked on sda1 to expand it to 1 MB of unallocated space, then expanded sda1 to the maximum capacity it could already use. I clicked sda

How can I get rid of this little twitch?

Ironwork So I have an update method in my SpriteKit game project where other follower characters are positioned based on the leader's following direction: -(void) update { if (_followingEnabled == YES || _isLeader == YES) { switch (currentDirection) { ca