When designing databases, what is the preferred way to store multiple true / false values?
In SQL Server
, there is BIT
datatype. You can store 0 or 1 there, compare the values but not run MIN
or MAX
.
In Oracle
, you just use NUMBER
or CHAR(1)
.
In MySQL
and PostgreSQL
any datatype is implicitly convertible to BOOLEAN
.
Both systems support BOOLEAN
datatype which you can use as is, without the operators, in the WHERE
or ON
clauses:
SELECT *FROM mytableWHERE col1
, which is impossible in SQL Server
and Oracle
(you need to have some kind or a predicate there).
In MySQL
, BOOLEAN
is a synonym for TINYINT(1)
.
In PostgreSQL
too (in terms of storage), but logically, it's not implicitly convertible to any other type.
From my own experience, I prefer char(1) for 'Y' or 'N'. Using 0 and 1 can be a bit confusing depending on how many beers I already drank and C++ main() function return 0 on success. ENUM and BIT types are more trouble than they are worth.
It is interesting to note that MySQL information_schema
uses VARCHAR(3) for 'YES' or 'NO'.
Example:
information_schema.USER_PRIVILEGES ( ... IS_GRANTABLE VARCHAR(3) NOT NULL DEFAULT '')
Instead of boolean datatypes, you may want to consider another data model for storing boolean values, which may be particularly appropriate in the following cases:
- when you will be having many yes/no columns.
- when you probably require adding more yes/no columns in the future.
- when the yes/no values do not change very frequently.
Defining user permissions may be a typical example of the above. Consider the following tables:
Table "Users": (user_id, name, surname, country)Table "Permissions": (permission_id, permission_text)Table "Users_Permissions": (user_id, permission_id)
In the Permissions
table you would define all the possible permissions that may be applicable to users. You would have to add a row to the Permissions
table for each yes/no attribute. As you may have noticed, this makes it very easy to add new permissions in the future without having to modify the database schema.
With the above model, you would then indicate a TRUE value by assigning a user_id
with a permission_id
in the Users_Permissions
table. Otherwise it would be FALSE by default.
For example:
Table "Permissions"permission_id text-----------------------------------1 "Read Questions"2 "Answer Questions"3 "Edit Questions"4 "Close Questions"Table "Users_Permissions"user_id permission_id-----------------------------------1 11 21 32 12 3
Advantages
- Indexing: you can easily use an index on to query for specific facts.
- Space: The default convention saves space when you have many false values.
- Normalized: Facts are defined in their own tables (in the
Permissions
andUsers_Permissions
tables.) You can easily store more information on each fact.
Disadvantages
- Queries: Simple queries would require JOINs.
- Setting to False: To set a value to false, you would have to delete a row (from the
Users_Permissions
table.) Otherwise you can use a 'deleted' flag in theUsers_Permissions
table, which would also allow you to store information for audit trails, such as when a permission was modified and by whom. If you delete the row, you would not be able to store this information.