How can I handle different data types in an Entity-Attribute-Value design (e.g. single table with multiple columns or multiple tables per data type)? How can I handle different data types in an Entity-Attribute-Value design (e.g. single table with multiple columns or multiple tables per data type)? database database

How can I handle different data types in an Entity-Attribute-Value design (e.g. single table with multiple columns or multiple tables per data type)?


This is a well known problem. The problem with the approach you mentioned is that you need to know the type of the attribute before you query for it. it's not the end of the world because you manage metadata but still...

Two possiable solutions might be

  1. using a varchar2 datatype to represent all data types in a knownformat. Numbers and chars are no problem, date values can be writtenin a predefined manner (it's like implementing to_String() in any OOdesign).
  2. use theANYDATAdata type. i personally played around with it but decided not to useit.


The easiest, most performant, etc is to convert all values in the database to Strings. Problems such as those indicated will usually be obvious, and even well typed columns suffer exactly the same kind of issues, which usually express as performance problems.

With a little care, you can maintain collation order, if that matters (e.g. by formatting dates as year/month/day), and validation of types should not be done by the database anyway as it is too late. Negative numbers are a pain, as are floats, but it is highly unusual to index by a number that can be negative or a float, and in-memory sorts are generally fast.

Where the type of the data is not obvious, or needs to be known by a downstream processor, then add a type column.

Generally, all integrity constraints against column values can be checked before the record is written, either in code (good), or in triggers (not so good). Trying to use the native features with varying types will only take you so far, and is probably not so useful anyway as values often have many business specific constraints anyway e.g. birth-date needs to be non-null and after 1900.

For performance, use compound indexes including the entity and attribute as prefixes. Indexes may be partitioned by the entity-attribute prefix, reducing any impact of the extra depth of the index, and they compress really well (the prefix will compress to one or two bytes), so the size difference is minimal.

Querying from EAV tables is often best done in views which will unpack the entities for you so that the structure can be returned to something like you would expect, though this may be irrelevant if you are dealing with varying columns e.g. in patient forms which are characterized by a large number of varying elements depending on the history. Then it is probably easier to process in your business logic.

Finally, nowadays this kind of data is simply not stored in column oriented relational database style. It is usually stored as a XML (or JSON) document (XML types in Oracle), and most databases provide some native XML processing capability in order to search and manipulate such data. This is okay for normal form storage and retrieval, but tends to make arbitrary queries such as "give me all patients over 60 who have had pneumonia in the last year" rather slow, or a bit more involved as tagged reverse indexing is needed. Nevertheless it is worth seeing if a document orientated/text oriented approach is a better solution.

Good luck!