Unescape unicode in character string

regex json r unicode utf-8

After playing with this some more I think the best I can do is searching for \uxxxx patterns using a regular expression, and then parse those using the R parser:

unescape_unicode <- function(x){  #single string only  stopifnot(is.character(x) && length(x) == 1)  #find matches  m <- gregexpr("(\\\\)+u[0-9a-z]{4}", x, ignore.case = TRUE)  if(m[[1]][1] > -1){    #parse matches    p <- vapply(regmatches(x, m)[[1]], function(txt){      gsub("\\", "\\\\", parse(text=paste0('"', txt, '"'))[[1]], fixed = TRUE, useBytes = TRUE)    }, character(1), USE.NAMES = FALSE)    #substitute parsed into original    regmatches(x, m) <- list(p)  }  x}

This seems to work for all cases and I haven't found any odd side effects yet

regex json r unicode utf-8

There is a function for this in stringi package :)

require(stringi)    escaped <- "Z\\u00FCrich"escaped## [1] "Z\\u00FCrich"stri_unescape_unicode(escaped)## [1] "Zürich"

regex json r unicode utf-8

Maybe like this?

\"x\"\s:\s\"([^"]*?)\"

This is not looking letters. Just waiting for a quote

CodeHunter

Unescape unicode in character string

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last