Word boundary in Lucene regex

regex elasticsearch lucene

In ElasticSearch regex flavor, there is no direct equivalent to a word boundary. Initial \b is something like (^|[^A-Za-z0-9_]) if the word starts with a word char, and the trailing \b is like ($|[^A-Za-z0-9_]) if the word ends with a word char.

Thus, we need to make sure that there is a non-word char before and after word or start/end of string. Since the regex is anchored by default, all we need to make [^A-Za-z0-9_] optional at start/end of string is add .* beside and wrap with an optional grouping construct:

(.*[^A-Za-z0-9_])?word([^A-Za-z0-9_].*)?

Details

(.*[^A-Za-z0-9_])? - either start of string or any 0+ chars (but a line break char, else use (.|\n)*) and then any char but a word char (basically, it is start of string followed with 1 or 0 occurrences of the pattern inside the group)
word - a word
([^A-Za-z0-9_].*)? - an optional sequence of any char but a word char followed with any 0+ chars, followed by the end of string position (implicit in Lucene regex).

CodeHunter

Word boundary in Lucene regex

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last