Unescape unicode in character string Unescape unicode in character string json json

Unescape unicode in character string


After playing with this some more I think the best I can do is searching for \uxxxx patterns using a regular expression, and then parse those using the R parser:

unescape_unicode <- function(x){  #single string only  stopifnot(is.character(x) && length(x) == 1)  #find matches  m <- gregexpr("(\\\\)+u[0-9a-z]{4}", x, ignore.case = TRUE)  if(m[[1]][1] > -1){    #parse matches    p <- vapply(regmatches(x, m)[[1]], function(txt){      gsub("\\", "\\\\", parse(text=paste0('"', txt, '"'))[[1]], fixed = TRUE, useBytes = TRUE)    }, character(1), USE.NAMES = FALSE)    #substitute parsed into original    regmatches(x, m) <- list(p)  }  x}

This seems to work for all cases and I haven't found any odd side effects yet


There is a function for this in stringi package :)

require(stringi)    escaped <- "Z\\u00FCrich"escaped## [1] "Z\\u00FCrich"stri_unescape_unicode(escaped)## [1] "Zürich"


Maybe like this?

\"x\"\s:\s\"([^"]*?)\"

This is not looking letters. Just waiting for a quote