regexp_replace in Pyspark dataframe

apache-spark hadoop pyspark apache-spark-sql pyspark-dataframes

You may wanted to look at the code from spark git for regexp_replace-

override def nullSafeEval(s: Any, p: Any, r: Any): Any = {    if (!p.equals(lastRegex)) {      // regex value changed      lastRegex = p.asInstanceOf[UTF8String].clone()      pattern = Pattern.compile(lastRegex.toString)    }    if (!r.equals(lastReplacementInUTF8)) {      // replacement string changed      lastReplacementInUTF8 = r.asInstanceOf[UTF8String].clone()      lastReplacement = lastReplacementInUTF8.toString    }    val m = pattern.matcher(s.toString())    result.delete(0, result.length())    while (m.find) {      m.appendReplacement(result, lastReplacement)    }    m.appendTail(result)    UTF8String.fromString(result.toString)  }

the above code accepts the expression as Any and then call toString() on it
At last, it is converting the result again in toString

UTF8String.fromString(result.toString)

ref - spark-git

CodeHunter

regexp_replace in Pyspark dataframe

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last