Missing "'into'" in separate column (tidyr)

Zhao Hongfeng

I constructed metadata for 10 papers. The dput()results are presented as follows:

> dput(itemlist)
structure(list(title = c("钱学森工程科学思想的实践者 [科普文章]", 
"超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", "Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"一种热机械疲劳实验的装置和方法 [专利]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]"
), publish = c("2014", "2014", " 2013", "专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"2012", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10"
), author = c("丁雁生; 洪友士; 金和", "洪友士; 中国科学院老科技工作者协会工程力学分会", 
"Sih G C; Hong YS(洪友士)", "谢季佳; 赵爱国; 武晓东; 洪友士", 
"陈杰; 刘洋; 汤亚南; 洪友士", "赵爱国; 洪友士; 谢季佳", "雷铮强; 洪友士; 谢季佳; 赵爱国", 
"Zhang SY(张双寅); Wang L(王雷); Hong YS(洪友士)", "Lei ZQ(雷铮强); Xie JJ(谢季佳); Zhao AG(赵爱国); Hong YS(洪友士)", 
"Wu XD(武晓东); Ge F(葛斐); Hong YS(洪友士)")), .Names = c("title", 
"publish", "author"), row.names = c(NA, 10L), class = "data.frame")

I found that tidyr can separate lists by each element in an attribute. In this example, I split "author" into separate lines:

> dput(itemlist_tidy)
structure(list(title = c("钱学森工程科学思想的实践者 [科普文章]", 
"钱学森工程科学思想的实践者 [科普文章]", "钱学森工程科学思想的实践者 [科普文章]", 
"超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", "超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", 
"Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"一种热机械疲劳实验的装置和方法 [专利]", "一种热机械疲劳实验的装置和方法 [专利]", 
"一种热机械疲劳实验的装置和方法 [专利]", "一种热机械疲劳实验的装置和方法 [专利]", 
"IUTAM和ICTAM的起源和历程 [科普文章]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"IUTAM和ICTAM的起源和历程 [科普文章]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "加载频率对金属材料超高周疲劳性能的影响 [会议论文]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", "Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]"
), publish = c("2014", "2014", "2014", "2014", "2014", " 2013", 
" 2013", "专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"2012", "2012", "2012", "2012", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10"
), author = c("丁雁生", " 洪友士", " 金和", "洪友士", " 中国科学院老科技工作者协会工程力学分会", 
"Sih G C", " Hong YS(洪友士)", "谢季佳", " 赵爱国", " 武晓东", 
" 洪友士", "陈杰", " 刘洋", " 汤亚南", " 洪友士", "赵爱国", " 洪友士", 
" 谢季佳", "雷铮强", " 洪友士", " 谢季佳", " 赵爱国", "Zhang SY(张双寅)", 
" Wang L(王雷)", " Hong YS(洪友士)", "Lei ZQ(雷铮强)", " Xie JJ(谢季佳)", 
" Zhao AG(赵爱国)", " Hong YS(洪友士)", "Wu XD(武晓东)", " Ge F(葛斐)", 
" Hong YS(洪友士)")), row.names = c(NA, -32L), class = "data.frame", .Names = c("title", 
"publish", "author"))

My focus is on the "author" column:

All authors are separated by semicolon (';')
Not all papers have the same number of authors.

Now, I want to split the "author" column into different columns in order to plot the co-authors via igraph. It looks like "tidyr" is the best option, but it doesn't work:

> library(tidyr)
> v_t <- separate(itemlist, col="author", sep = ";", remove = TRUE, convert = FALSE)
Error in simplifyPieces(pieces, n, fill == "left") : 
  argument "into" is missing, with no default

I can't understand the exact meaning of the error message. What conditions do we need to meet to split "author" into many columns. I figured that since tidyr provides the ability to separate rows or columns, it must be a way to use these tables separately. Should we be aware?

robot

Separation requires a parameter intoin the function . These should be the names of the variables to be created. Your call does not include this parameter.

Adapted example from help file:

library(dplyr)
library(tidyr)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
separate(data = df, col = x, into = c("A", "B"))
     A    B
1 <NA> <NA>
2    a    b
3    a    d
4    b    c

You can use str_count()from stringr to determine the maximum number of authors in the author column, and then use it to specify the number of columns to create in the separate()function . I used this Q&A as inspiration: Use Tidyr's "separate" to separate a string into columns, then create a new column with the count

Here is an example from a simplified dataset:

df <- data.frame(id = c(1,2,3), 
             author = c("name1; name2; name3", 
                        "name1; name2", "name1"))

df
  id              author
1  1 name1; name2; name3
2  2        name1; name2
3  3               name1
library(tidyr)
library(stringr)
str_count(df$author, ";")
[1] 2 1 0
max_n_authors <- max(str_count(df$author, ";")) + 1
max_n_authors
[1] 3
paste("author", 1:max_n_authors)
[1] "author 1" "author 2" "author 3"
df <- df %>% 
    separate(., col = author, into = paste("author", 1:max_n_authors))
Warning message:
Too few values at 2 locations: 2, 3 
df
  id author 1 author 2 author 3
1  1    name1    name2    name3
2  2    name1    name2     <NA>
3  3    name1     <NA>     <NA>

Use tidyr::separate with sep="" to separate a column into multiple columns

Witte df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE) df category sequence 1 X AAT.G 2 Y CCG-T I want to split that column sequenceinto 5 columns (one for each character). I'm trying t

Use tidyr::separate with sep="" to separate a column into multiple columns

Use a separate (tidyr) separate column on the first encountered number via dplyr

Conrad I'm trying to split a rather confusing column into two columns containing period and description . My data resembles the following excerpt: set.seed(1) dta <- data.frame(indicator=c("someindicator2001", "someindicator2011",

Use tidyr::separate with sep="" to separate a column into multiple columns

How to separate each character in a string in a vector into a column using Tidyr

username I want to split each string in the vector into columns, but I can't! library(tidyr) library(dplyr) df <- data.frame(x = c("abe", "bas", "dds", "eer")) df %>% separate(x, c("A", "B", "C"), sep=1) My desired output looks like this A B C

How to separate each character in a string in a vector into a column using Tidyr

tidyr splits a column containing characters and numbers into two separate columns in R

The Mandalorian I have a dataset with a offensecolumn with offensedescriptions and their associated attacks code. Crime codes are sometimes complete numericand sometimes combined numericand character. How can I split this column into two different columns one

tidyr: separate columns while keeping delimiters in the first column

Keenan I have a column that I am trying to split into two while keeping the delimiter. I've gotten this far, but part of the separator is removed. I also need to do this split again, adding the delimiter to the first column, but I don't know how to do it. dupl

How to separate each character in a string in a vector into a column using Tidyr

What is the meaning of "\\." in tidyr:: separate?

Ahmed Talib What is the purpose of "\." and why is it quoted? Here is the code: library(tidyr) iris.tidy <- iris %>% gather(key, Value, -Species) %>% separate(key, c("Part", "Measure"), "\\.") It is for the iris dataset ronak shah It's easier to understa

separate() with NA in Tidyr

jazz I have a related question separate()in the tiddle package . separate() works when there are no NAs in the dataframe. I've used this feature a lot. However, today I had a case where the dataframe contained NAs. separate()Return an error message. I might be

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

How are you I'm reading the ggplot2 book by Hadley Wickham and following the code. However, the USAboundaries package does not work as shown in the book. In this book, the code works like this: library(USAboundaries) c18 <- us_boundaries(as.Date("1820-01-01"))

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

Euler Salter I have the following dataframe df <- data.frame(x=c("one", "one, two", "two, three", "one, two, three")) looks like this x 1 one 2 one, two 3 two, three 4 one, two, three I would like to be able to list th

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

tidyr::separate() produces unexpected results

username I am feeding a dataframe to tidyr::separate() and am getting unexpected results. I have a minimal working example below that shows how I use it, what I expect, and what it actually produces. Why doesn't this work? # Create toy data frame dat <- data.f

Missing "'into'" in separate column (tidyr)

Related

Use tidyr::separate with sep="" to separate a column into multiple columns

Use tidyr::separate with sep="" to separate a column into multiple columns

Use a separate (tidyr) separate column on the first encountered number via dplyr

Use tidyr::separate with sep="" to separate a column into multiple columns

Use tidyr::separate with sep="" to separate a column into multiple columns

Use tidyr::separate with sep="" to separate a column into multiple columns

How to separate each character in a string in a vector into a column using Tidyr

How to separate each character in a string in a vector into a column using Tidyr

tidyr splits a column containing characters and numbers into two separate columns in R

tidyr: separate columns while keeping delimiters in the first column

How to separate each character in a string in a vector into a column using Tidyr

What is the meaning of "\\." in tidyr:: separate?

separate() with NA in Tidyr

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R uses Tidyr's delimiter to separate column values, but the values are in nested lists

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

R tidyr: Split character column with comma-separated text into multiple columns using RegEx using separate function

tidyr::separate() produces unexpected results

tidyr::separate() produces unexpected results

tidyr::separate() produces unexpected results

tidyr::separate() produces unexpected results

Ranking