Design Pattern for Custom Fields in Relational Database

mysql sql-server asp.net-mvc oracle database-design

Avoid stringly-typed data by replacing VALUE with NUMBER_VALUE, DATE_VALUE, STRING_VALUE. Those three types are good enough most of the time. You can add XMLTYPE and other fancy columns later if they're needed. And for Oracle, use VARCHAR2 instead of CHAR to conserve space.

Always try to store values as the correct type. Native data types are faster, smaller, easier to use, and safer.

Oracle has a generic data type system (ANYTYPE, ANYDATA, and ANYDATASET), but those types are difficult to use and should be avoided in most cases.

Architects often think using a single field for all data makes things easier. It makes it easier to generate pretty pictures of the data model but it makes everything else more difficult. Consider these issues:

You cannot do anything interesting with data without knowing the type. Even to display data it's useful to know the type to justify the text. In 99.9% of alluse cases it will be obvious to the user which of the 3 columns is relevant.

Developing type-safe queries against stringly-typed data is painful. For example, let's say you want to find "Date of Birth" for people born in this millennium:

select *from ReportFieldValuejoin ReportField    on ReportFieldValue.ReportFieldid = ReportField.idwhere ReportField.name = 'Date of Birth'    and to_date(value, 'YYYY-MM-DD') > date '2000-01-01'

Can you spot the bug? The above query is dangerous, even if you stored the date in the correct format, and very few developers know how to properly fix it. Oracle has optimizations that make it difficult to force a specific order of operations. You'll need a query like this to be safe:

select *from(    select ReportFieldValue.*, ReportField.*        --ROWNUM ensures type safe by preventing view merging and predicate pushing.        ,rownum    from ReportFieldValue    join ReportField        on ReportFieldValue.ReportFieldid = ReportField.id    where ReportField.name = 'Date of Birth')where to_date(value, 'YYYY-MM-DD') > date '2000-01-01';

You don't want to have to tell every developer to write their queries that way.

mysql sql-server asp.net-mvc oracle database-design

Your design is a variation of the Entity Attribute Value (EAV) data model, which is often regarded as an anti-pattern in database design.

Maybe a better approach for you would be to create a reporting values table with, say, 300 columns (NUMBER_VALUE_1 through NUMBER_VALUE_100, VARCHAR2_VALUE_1..100, and DATE_VALUE_1..100).

Then, design the rest of your data model around tracking which reports use which columns and what they use each column for.

This has two benefits: first, you are not storing dates and numbers in strings (the benefits of which have already been pointed out), and second, you avoid many of the performance and data integrity issues associated with the EAV model.

EDIT -- adding some empirical results of an EAV model

Using an Oracle 11g2 database, I moved 30,000 records from one table into an EAV data model. I then queried the model to get those 30,000 records back.

SELECT SUM (header_id * LENGTH (ordered_item) * (SYSDATE - schedule_ship_date))FROM   (SELECT rf.report_type_id,               rv.report_header_id,               rv.report_record_id,               MAX (DECODE (rf.report_field_name, 'HEADER_ID', rv.number_value, NULL)) header_id,               MAX (DECODE (rf.report_field_name, 'LINE_ID', rv.number_value, NULL)) line_id,               MAX (DECODE (rf.report_field_name, 'ORDERED_ITEM', rv.char_value, NULL)) ordered_item,               MAX (DECODE (rf.report_field_name, 'SCHEDULE_SHIP_DATE', rv.date_value, NULL)) schedule_ship_date        FROM   eav_report_record_values rv INNER JOIN eav_report_fields rf ON rf.report_field_id = rv.report_field_id        WHERE  rv.report_header_id = 20         GROUP BY rf.report_type_id, rv.report_header_id, rv.report_record_id)

The results were:

1 row selected.Elapsed: 00:00:22.62Execution Plan--------------------------------------------------------------------------------------------------------------------------------------------------------------| Id  | Operation                       | Name                        | Rows  | Bytes | Cost (%CPU)|----------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                |                             |     1 |  2026 |    53  (67)||   1 |  SORT AGGREGATE                 |                             |     1 |  2026 |            ||   2 |   VIEW                          |                             |   130K|   251M|    53  (67)||   3 |    HASH GROUP BY                |                             |   130K|   261M|    53  (67)||   4 |     NESTED LOOPS                |                             |       |       |            ||   5 |      NESTED LOOPS               |                             |   130K|   261M|    36  (50)||   6 |       TABLE ACCESS FULL         | EAV_REPORT_FIELDS           |   350 | 15050 |    18   (0)||*  7 |       INDEX RANGE SCAN          | EAV_REPORT_RECORD_VALUES_N1 |   130K|       |     0   (0)||*  8 |      TABLE ACCESS BY INDEX ROWID| EAV_REPORT_RECORD_VALUES    |   372 |   749K|     0   (0)|----------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   7 - access("RV"."REPORT_HEADER_ID"=20)   8 - filter("RF"."REPORT_FIELD_ID"="RV"."REPORT_FIELD_ID")Note-----   - 'PLAN_TABLE' is old versionStatistics----------------------------------------------------------          4  recursive calls          0  db block gets     275480  consistent gets        465  physical reads          0  redo size        307  bytes sent via SQL*Net to client        252  bytes received via SQL*Net from client          2  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)          1  rows processed

That's 22 seconds to get 30,000 rows of 4 columns each. That is way too long. From a flat table we'd be looking at under 2 seconds, easy.

mysql sql-server asp.net-mvc oracle database-design

Use MariaDB, with it's Dynamic Columns. Effectively, that lets you put all the miscellany columns into a single column, yet still give you efficient access to them.

I would keep a few of the common fields in their own columns.

More discussion of EAV and suggestions (and how to do it without Dynamic Columns).

CodeHunter

Design Pattern for Custom Fields in Relational Database

EDIT -- adding some empirical results of an EAV model

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last