Are postgres JSON indexes efficient enough compared with classic normalized tables? Are postgres JSON indexes efficient enough compared with classic normalized tables? postgresql postgresql

Are postgres JSON indexes efficient enough compared with classic normalized tables?


I will need some queries in the form "list all objects where one of altnames is 'foobar'." The expected table size is on the order of a few million records. Postgres JSON queries can be used for that, and it can also be indexed (Index For Finding Element in JSON array, for example). However, SHOULD it be done that way or is it a perverse workaround that's not recommended?

It can be done that way but that doesn't mean that you should. In some sense, the best practice is well documented already (see e.g. using hstore vs using XML vs using EAV vs using a separate table) with a new datatype which, for all intents and practical purposes (besides validation and syntax), is no different from prior unstructured or semi-structured options.

Put another way, it's the same old pig with new makeup.

JSON offers the ability to use inverted search tree indexes, in the same way as hstore, array types and tsvectors do. They work fine, but keep in mind that they're primarily designed for extracting points in a neighborhood (think geometry types) ordered by distance, rather than for extracting a list of values in lexicographical order.

To illustrate, take the two plans that Roman's answer outlines:

  • The one that does an index scan plows through disk pages directly, retrieving the rows in the order indicated by the index.
  • The one that does a bitmap index scan starts by identifying every disk page that might contain a row, and reads them as they appear on disk, as if it was (and in fact, precisely like) doing a sequence scan that skips useless areas.

Getting back to your question: Cluttered and oversized inverted tree indexes will indeed improve the performance of your app if you use Postgres tables as giant JSON stores. But they're not a silver bullet either, and they won't get you as far as proper relational design when dealing with bottlenecks.

The bottom line, in the end, is no different from what you'd get when deciding to use hstore or an EAV:

  1. If it needs an index (i.e. it frequently appears in a where clause or, even more importantly, in a join clause), you likely want the data in a separate field.
  2. If it's primarily cosmetic, JSON/hstore/EAV/XML/whatever-makes-you-sleep-at-night works fine.


I'd say it worth a try. I've created some test (100000 records, ~10 elements in JSON array) and checked how it's working:

create table test1 (id serial primary key, data json);create table test1_altnames (id int, name text);create or replace function array_from_json(_j json)returns text[] as$func$    select array_agg(x.elem::text)    from json_array_elements(_j) as x(elem)$func$language sql immutable;with cte as (    select        (random() * 100000)::int as grp, (random() * 1000000)::int as name    from generate_series(1, 1000000)), cte2 as (    select        array_agg(Name) as "AltNames"    from cte    group by grp)insert into test1 (data)select row_to_json(t)from cte2 as tinsert into test1_altnames (id, name)select id, json_array_elements(data->'AltNames')::textfrom test1create index ix_test1 on test1 using gin(array_from_json(data->'AltNames'));create index ix_test1_altnames on test1_altnames (name);

Query JSON (30ms on my machine):

select * from test1 where '{489147}' <@ array_from_json(data->'AltNames');"Bitmap Heap Scan on test1  (cost=224.13..1551.41 rows=500 width=36)""  Recheck Cond: ('{489147}'::text[] <@ array_from_json((data -> 'AltNames'::text)))""  ->  Bitmap Index Scan on ix_test1  (cost=0.00..224.00 rows=500 width=0)""        Index Cond: ('{489147}'::text[] <@ array_from_json((data -> 'AltNames'::text)))"

Query table with names (15ms on my machine):

select * from test1 as t where t.id in (select tt.id from test1_altnames as tt where tt.name = '489147');"Nested Loop  (cost=12.76..20.80 rows=2 width=36)""  ->  HashAggregate  (cost=12.46..12.47 rows=1 width=4)""        ->  Index Scan using ix_test1_altnames on test1_altnames tt  (cost=0.42..12.46 rows=2 width=4)""              Index Cond: (name = '489147'::text)""  ->  Index Scan using test1_pkey on test1 t  (cost=0.29..8.31 rows=1 width=36)""        Index Cond: (id = tt.id)"

Also I have to note, there're some cost to insert/delete rows into table with names (test1_altnames), so it's a bit more complicated than just selecting rows. Personally I like solution with JSON.