Missing "'into'" in separate column (tidyr)


Zhao Hongfeng

I constructed metadata for 10 papers. The dput()results are presented as follows:

> dput(itemlist)
structure(list(title = c("钱学森工程科学思想的实践者 [科普文章]", 
"超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", "Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"一种热机械疲劳实验的装置和方法 [专利]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]"
), publish = c("2014", "2014", " 2013", "专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"2012", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10"
), author = c("丁雁生; 洪友士; 金和", "洪友士; 中国科学院老科技工作者协会工程力学分会", 
"Sih G C; Hong YS(洪友士)", "谢季佳; 赵爱国; 武晓东; 洪友士", 
"陈杰; 刘洋; 汤亚南; 洪友士", "赵爱国; 洪友士; 谢季佳", "雷铮强; 洪友士; 谢季佳; 赵爱国", 
"Zhang SY(张双寅); Wang L(王雷); Hong YS(洪友士)", "Lei ZQ(雷铮强); Xie JJ(谢季佳); Zhao AG(赵爱国); Hong YS(洪友士)", 
"Wu XD(武晓东); Ge F(葛斐); Hong YS(洪友士)")), .Names = c("title", 
"publish", "author"), row.names = c(NA, 10L), class = "data.frame")

I found that tidyr can separate lists by each element in an attribute. In this example, I split "author" into separate lines:

> dput(itemlist_tidy)
structure(list(title = c("钱学森工程科学思想的实践者 [科普文章]", 
"钱学森工程科学思想的实践者 [科普文章]", "钱学森工程科学思想的实践者 [科普文章]", 
"超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", "超高周疲劳裂纹萌生与初始扩展的特征尺度 [科普文章]", 
"Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"Proceedings of International conference on Airworthiness & Fatigue – 7th ICSAELS Series Conference [期刊论文]", 
"一种热机械疲劳实验的装置和方法 [专利]", "一种热机械疲劳实验的装置和方法 [专利]", 
"一种热机械疲劳实验的装置和方法 [专利]", "一种热机械疲劳实验的装置和方法 [专利]", 
"IUTAM和ICTAM的起源和历程 [科普文章]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"IUTAM和ICTAM的起源和历程 [科普文章]", "IUTAM和ICTAM的起源和历程 [科普文章]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "加载频率对金属材料超高周疲劳性能的影响 [会议论文]", 
"加载频率对金属材料超高周疲劳性能的影响 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", "金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", 
"金属材料超高周疲劳行为的Monte-Carlo模拟 [会议论文]", "Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"Vibration behavior and response to an accidental collision of SFT prototype in Qiandao Lake (China) [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"A simulation on microstructure sensitivity to very-high-cycle fatigue behavior of metallic materials [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]", 
"Effect of traveling wave on vortex-induced vibrations of submerged floating tunnel tethers [会议论文]"
), publish = c("2014", "2014", "2014", "2014", "2014", " 2013", 
" 2013", "专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"专利类型: 发明专利, 专利号: ZL2009102374751, 申请日期: 2012, 公开日期: 2012-12-27", 
"2012", "2012", "2012", "2012", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", "第十五届全国疲劳与断裂学术会议摘要及论文集, 中国广东佛山", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10", 
"The 1st International Symposium on Archimedes Bridge, Qiandao Lake, China, 2010-10"
), author = c("丁雁生", " 洪友士", " 金和", "洪友士", " 中国科学院老科技工作者协会工程力学分会", 
"Sih G C", " Hong YS(洪友士)", "谢季佳", " 赵爱国", " 武晓东", 
" 洪友士", "陈杰", " 刘洋", " 汤亚南", " 洪友士", "赵爱国", " 洪友士", 
" 谢季佳", "雷铮强", " 洪友士", " 谢季佳", " 赵爱国", "Zhang SY(张双寅)", 
" Wang L(王雷)", " Hong YS(洪友士)", "Lei ZQ(雷铮强)", " Xie JJ(谢季佳)", 
" Zhao AG(赵爱国)", " Hong YS(洪友士)", "Wu XD(武晓东)", " Ge F(葛斐)", 
" Hong YS(洪友士)")), row.names = c(NA, -32L), class = "data.frame", .Names = c("title", 
"publish", "author"))

My focus is on the "author" column:

  1. All authors are separated by semicolon (';')
  2. Not all papers have the same number of authors.

Now, I want to split the "author" column into different columns in order to plot the co-authors via igraph. It looks like "tidyr" is the best option, but it doesn't work:

> library(tidyr)
> v_t <- separate(itemlist, col="author", sep = ";", remove = TRUE, convert = FALSE)
Error in simplifyPieces(pieces, n, fill == "left") : 
  argument "into" is missing, with no default

I can't understand the exact meaning of the error message. What conditions do we need to meet to split "author" into many columns. I figured that since tidyr provides the ability to separate rows or columns, it must be a way to use these tables separately. Should we be aware?

robot

Separation requires a parameter intoin the function . These should be the names of the variables to be created. Your call does not include this parameter.

Adapted example from help file:

library(dplyr)
library(tidyr)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
separate(data = df, col = x, into = c("A", "B"))
     A    B
1 <NA> <NA>
2    a    b
3    a    d
4    b    c

You can use str_count()from stringr to determine the maximum number of authors in the author column, and then use it to specify the number of columns to create in the separate()function . I used this Q&A as inspiration: Use Tidyr's "separate" to separate a string into columns, then create a new column with the count

Here is an example from a simplified dataset:

df <- data.frame(id = c(1,2,3), 
             author = c("name1; name2; name3", 
                        "name1; name2", "name1"))

df
  id              author
1  1 name1; name2; name3
2  2        name1; name2
3  3               name1
library(tidyr)
library(stringr)
str_count(df$author, ";")
[1] 2 1 0
max_n_authors <- max(str_count(df$author, ";")) + 1
max_n_authors
[1] 3
paste("author", 1:max_n_authors)
[1] "author 1" "author 2" "author 3"
df <- df %>% 
    separate(., col = author, into = paste("author", 1:max_n_authors))
Warning message:
Too few values at 2 locations: 2, 3 
df
  id author 1 author 2 author 3
1  1    name1    name2    name3
2  2    name1    name2     <NA>
3  3    name1     <NA>     <NA>

Related


What is the meaning of "\\." in tidyr:: separate?

Ahmed Talib What is the purpose of "\." and why is it quoted? Here is the code: library(tidyr) iris.tidy <- iris %>% gather(key, Value, -Species) %>% separate(key, c("Part", "Measure"), "\\.") It is for the iris dataset ronak shah It's easier to understa

separate() with NA in Tidyr

jazz I have a related question separate()in the tiddle package . separate() works when there are no NAs in the dataframe. I've used this feature a lot. However, today I had a case where the dataframe contained NAs. separate()Return an error message. I might be

tidyr::separate() produces unexpected results

username I am feeding a dataframe to tidyr::separate() and am getting unexpected results. I have a minimal working example below that shows how I use it, what I expect, and what it actually produces. Why doesn't this work? # Create toy data frame dat <- data.f

tidyr::separate() produces unexpected results

username I am feeding a dataframe to tidyr::separate() and am getting unexpected results. I have a minimal working example below that shows how I use it, what I expect, and what it actually produces. Why doesn't this work? # Create toy data frame dat <- data.f

tidyr::separate() produces unexpected results

username I am feeding a dataframe to tidyr::separate() and am getting unexpected results. I have a minimal working example below that shows how I use it, what I expect, and what it actually produces. Why doesn't this work? # Create toy data frame dat <- data.f

tidyr::separate() produces unexpected results

username I am feeding a dataframe to tidyr::separate() and am getting unexpected results. I have a minimal working example below that shows how I use it, what I expect, and what it actually produces. Why doesn't this work? # Create toy data frame dat <- data.f