Database schema confusing (Index and Constraints) Database schema confusing (Index and Constraints) mysql mysql

Database schema confusing (Index and Constraints)


I'd prefer the second approach. By using surrogate ID numbers when they're not logically necessary for identification, you introduce more mandatory joins. This requires you to "chase ID numbers all over the database", which is the SQL equivalent to "chasing pointers all over the database". Chasing pointers was characteristic of IMS, one of the database architectures the relational model intended to replace. (IMS uses a hierarchical architecture.) There's no point reinventing it today. (Although a lot of people do just that.)

If you have, for example, five levels of surrogate ID numbers, and you want a person's name, you have to do four joins to get it. Using the second approach, you just need one join. If you don't want to write multi-column joins, use CREATE VIEW and do it just once.

Performance is simple to test. Just generate a few million random-ish rows using your favorite scripting language, and load them into a test server. You'll not only find where your performance problems are hiding, you'll find all the errors in your CREATE TABLE code. (Your code won't work as-is.) Learn about EXPLAIN if you don't already know about it.

As for indexing, you can test that on the random-ish rows you generate and load. A multi-column index on (first_name, last_name) will work best if users always supply a first name. But a lot of users won't do that, preferring to search by last name instead. A multi-column index on (first_name, last_name) isn't effective for users who prefer to search by last name. You can test that.

For that reason alone, indexing of first names and last names is usually more effective if there are two separate indexes, one for the first name, and one for the last name.


What does chasing id numbers mean?

The unspoken design pattern underlying this question is "Every row must have an id number, and all foreign keys must reference the id number." In a SQL database, it's actually an anti-pattern. As a rule of thumb, any pattern that lets you design tables without thinking about keys should be presumed guilty until proven innocent--it should be presumed to be an anti-pattern until proven not to be.

create table A ( a_id integer primary key, a_1 varchar(15) not null unique, a_2 varchar(15) not null);create table B (  b_id integer primary key  a_id integer not null references A (a_id),  b_1  varchar(10) not null,  unique (a_id, b_1),);create table C (  c_id integer primary key,  b_id integer not null references B (b_id),  c_1 char(3) not null,  c_2 varchar(20) not null,  unique (b_id, c_1));create table D (  d_id integer primary key,  c_id integer not null references C (c_id),  d_1 integer not null,  d_2 varchar(15),  unique (c_id, d_1));

If you need a report on table "D", and the report needs

  • columns D.d_1 and D.d_2, and
  • columns A.a_1 and A.a_2,

you need 3 joins to get to it. (Try it.) You're chasing ID numbers. (Like chasing pointers in IMS.) The following structure is different.

create table A ( a_1 varchar(15) primary key, a_2 varchar(15) not null);create table B (  a_1 varchar(15) not null references A (a_1),  b_1  varchar(10) not null,  primary key (a_1, b_1),);create table C (  a_1 varchar(15) not null,  b_1 varchar(10) not null,  c_1 char(3) not null,  c_2 varchar(20) not null,  primary key (a_1, b_1, c_1),  foreign key (a_1, b_1) references B (a_1, b_1));create table D (  a_1 varchar(15) not null,  b_1 varchar(10) not null,  c_1 char(3) not null,  d_1 integer not null,  d_2 varchar(15),  primary key (a_1, b_1, c_1, d_1),  foreign key (a_1, b_1, c_1) references C (a_1, b_1, c_1));

With this structure, the same report needs a single join.

select D.d_1, D.d_2, A.a_1, A.a_2from Dinner join A on D.a_1 = A.a_1;


The first approach would be my preference

If you needed a table dependent on PersonJobDescription, say AgentContact, you can easily link to the surrogate Rec_ID, without it, you have to start jumping through hoops

The other reason would be what if it became a requirement to hold a Person/JobDescription for each year?Before you know where you are at you'll be having a four vakue compound key that still doesn't do the job.The rule Compound Primary Keys, should be a last resort will make your designs more flexible and resilient.