Efficiently create dataframe from strings containing key-value pairs

Here you go:

Recreate the data:

x <- c(  "HGVSc=ENST00000495576.1:n.820-1G>A;INTRON=1//1;CANONICAL=YES",  "DISTANCE=2179",  "HGVSc=ENST00000466430.1:n.911C>T;EXON=4//4;CANONICAL=YES",  "DISTANCE=27;CANONICAL=YES;common")

Create a named vector with your desired names. This is used for fast lookup later:

names <- setNames(1:15, c('ENSP','HGVS','DOMAINS','EXON','INTRON', 'HGVSp', 'HGVSc','CANONICAL','GMAF','DISTANCE', 'HGNC', 'CCDS', 'SIFT', 'PolyPhen', 'common'))

Create a helper function that assigns each variable to the correct position in a matrix. Then use lapply and strsplit:

assign <- function(x, names){  xx <- sapply(x, function(i)if(length(i)==2L) i else c(i, "YES"))  z <- rep(NA, length(names))  z[names[xx[1, ]]] <- xx[2, ]  z}sx <- lapply(strsplit(x, ";"), strsplit, "=")ret <- t(sapply(sx, assign, names))colnames(ret) <- names(names)ret

The results:

     ENSP HGVS DOMAINS EXON   INTRON HGVSp HGVSc                          CANONICAL GMAF DISTANCE HGNC[1,] NA   NA   NA      NA     "1//1" NA    "ENST00000495576.1:n.820-1G>A" "YES"     NA   NA       NA  [2,] NA   NA   NA      NA     NA     NA    NA                             NA        NA   "2179"   NA  [3,] NA   NA   NA      "4//4" NA     NA    "ENST00000466430.1:n.911C>T"   "YES"     NA   NA       NA  [4,] NA   NA   NA      NA     NA     NA    NA                             "YES"     NA   "27"     NA       CCDS SIFT PolyPhen common[1,] NA   NA   NA       NA    [2,] NA   NA   NA       NA    [3,] NA   NA   NA       NA    [4,] NA   NA   NA       "YES"

performance algorithm r

Here's another, faster, solution taking advantage of the original pairings...

##                   test elapsed replications relative average## 2    thell_solution(x)    0.37         1000    1.000 0.00037## 3   andrie_solution(x)    1.04         1000    2.811 0.00104## 1 original_solution(x)    2.61         1000    7.054 0.00261

Since pairing[1] always gets assigned pairing[2] except with the final bool (… not that I understand why that one flag is treated differently in the original string vector …) we can take advantage of the sequence and the fact that the vector will assign NA when a name is given without a value ( ie: x[5] == NA ) and we also have no need to call names multiple times. And since strsplit uses regex we can do alternation.

# Let `x` be as @Andrie made it in his answer.  Let `names` be as you had# in the original question.# A pre-built dummy record and empty list.na.record <- setNames(rep(NA, time = length(names)), names)y <- list()do.call(rbind, lapply(strsplit(x, "(;|=)"), FUN = function(x) {    x_seq <- seq.int(to = length(x), by = 2)    y[x[x_seq]] <- x[x_seq + 1]    y[is.na(y)] <- "YES"    na.record[x[x_seq]] <- y    na.record}))##      ENSP HGVS DOMAINS EXON   INTRON HGVSp HGVSc                         ## [1,] NA   NA   NA      NA     "1//1" NA    "ENST00000495576.1:n.820-1G>A"## [2,] NA   NA   NA      NA     NA     NA    NA                            ## [3,] NA   NA   NA      "4//4" NA     NA    "ENST00000466430.1:n.911C>T"  ## [4,] NA   NA   NA      NA     NA     NA    NA                            ##      CANONICAL GMAF DISTANCE HGNC CCDS SIFT PolyPhen common## [1,] "YES"     NA   NA       NA   NA   NA   NA       NA    ## [2,] NA        NA   "2179"   NA   NA   NA   NA       NA    ## [3,] "YES"     NA   NA       NA   NA   NA   NA       NA    ## [4,] "YES"     NA   "27"     NA   NA   NA   NA       "YES"

CodeHunter

Efficiently create dataframe from strings containing key-value pairs

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last