regexp_replace in Pyspark dataframe regexp_replace in Pyspark dataframe hadoop hadoop

regexp_replace in Pyspark dataframe


You may wanted to look at the code from spark git for regexp_replace-

override def nullSafeEval(s: Any, p: Any, r: Any): Any = {    if (!p.equals(lastRegex)) {      // regex value changed      lastRegex = p.asInstanceOf[UTF8String].clone()      pattern = Pattern.compile(lastRegex.toString)    }    if (!r.equals(lastReplacementInUTF8)) {      // replacement string changed      lastReplacementInUTF8 = r.asInstanceOf[UTF8String].clone()      lastReplacement = lastReplacementInUTF8.toString    }    val m = pattern.matcher(s.toString())    result.delete(0, result.length())    while (m.find) {      m.appendReplacement(result, lastReplacement)    }    m.appendTail(result)    UTF8String.fromString(result.toString)  }
  1. the above code accepts the expression as Any and then call toString() on it
  2. At last, it is converting the result again in toString
UTF8String.fromString(result.toString)

ref - spark-git