Unicode support for non-English characters with Sqlite Full Text Search in Android

Supplemental Answer

I ended up doing what @CL recommended and was able to successfully implement Full Text Search with Unicode. These are the basic steps I followed:

Replace all Unicode characters (>= 128) that are not parts of words with the space character.
(optional) Replace specific characters with more general ones. For example, ē, è, and é could all be replaced with e (if this sort of generalized search is desired). This is not necessary, but if you don't do this, then searching for é will only return documents with é, and searching for e will only return documents with e (and not é).
Populate the virtual FTS table using the modified text created in steps 1 and 2.
Populate your normal table with unmodified text. The schema and number of documents must be the same as when you created the FTS table, of course.
Link the virtual FTS table with your normal text table/column using an external content table so that you are not storing a copy of the modified text, only the document ids that were created from that text.

Please read Full text search example in Android for instructions in how to create the FTS table and link it to the normal table. This took a long time to figure out but in the end it made very fast full text searches even for a very large number of documents.

If you need more details please leave a comment below.

android sqlite unicode full-text-search

Unicode characters are handled like 'normal' letters, so you can use them in FTS data and search terms. (Prefix searches should work, too.)

The problem is that Unicode characters are not normalized, i.e., all characters are treated as letters (even if they actually are punctuation (―†), or other non-letter characters (☺♫)), and that upper/lowercase are not merged, and that diacritics are not removed.
If you want to handle those cases correctly, you have to do these normalizations manually before you insert the documents into the database, and before you use the search terms.

CodeHunter

Unicode support for non-English characters with Sqlite Full Text Search in Android

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last