Word separators for Postgres full text search with Rails Word separators for Postgres full text search with Rails postgresql postgresql

Word separators for Postgres full text search with Rails


As stated in the documentation (and noticed by AJcodez), there is a solution in creating a dedicated column for tsvector index. Then define a trigger that catches insertions to index urls properly:

CREATE test_url (url varchar NOT NULL, url_tsvector tsvector NOT NULL);

This method will transorm any non alpha characters into single space and turn the string into a tsvector:

CREATE OR REPLACE FUNCTION generate_url_tsvector(varchar) RETURNS tsvector LANGUAGE sql AS $_$    SELECT to_tsvector(regexp_replace($1, '[^\w]+', ' ', 'gi'));$_$;

Now create a trigger that calls this function:

CREATE OR REPLACE FUNCTION before_insert_test_url()RETURNS TRIGGERLANGUAGE plpgsql AS $_$BEGIN;  NEW.url_tsvector := generate_url_tsvector(NEW.url);   RETURN NEW;END;$_$;CREATE TRIGGER before_insert_test_url_trig BEFORE INSERT ON test_url FOR EACH ROW EXECUTE PROCEDURE before_insert_test_url();

Now, when url are inserted, the `url_tsvectorè field will be automatically populated.

INSERT INTO test_url (url) VALUES ('http://www.google.fr');TABLE test_url; id          url                     url_tsvector              2  http://www.google.fr  'fr':4 'googl':3 'http':1 'www':2 (1 row)

To FT search on URLs you only need to query against this field.

SELECT * FROM test_url WHERE url_tsvector @@ 'google'::tsquery;


I ended up modifying the pg_search gem to support arbitrary ts_vector expressions instead of just column names.The changes are here

Now I can write:

pg_search_scope :search,     against: [[:title , 'B'], ["to_tsvector(regexp_replace(url, '[^\\w]+', ' ', 'gi'))", 'A']],    using: {tsearch: {dictionary: "simple"}}