regexp_replace in Pyspark dataframe
You may wanted to look at the code from spark git for regexp_replace
-
override def nullSafeEval(s: Any, p: Any, r: Any): Any = { if (!p.equals(lastRegex)) { // regex value changed lastRegex = p.asInstanceOf[UTF8String].clone() pattern = Pattern.compile(lastRegex.toString) } if (!r.equals(lastReplacementInUTF8)) { // replacement string changed lastReplacementInUTF8 = r.asInstanceOf[UTF8String].clone() lastReplacement = lastReplacementInUTF8.toString } val m = pattern.matcher(s.toString()) result.delete(0, result.length()) while (m.find) { m.appendReplacement(result, lastReplacement) } m.appendTail(result) UTF8String.fromString(result.toString) }
- the above code accepts the expression as
Any
and then calltoString()
on it - At last, it is converting the result again in
toString
UTF8String.fromString(result.toString)
ref - spark-git