When to store precalculated values vs calculating them when retrieving them?

database database-design normalization denormalization

About your two questions:

I can tell new developers that these structures are not normalized, but they can say its faster. How do I counter that? Do I counter that? Do others structure their databases like this?!

It may be faster, but it's not necessarily so: whenever you decide to add extra information to a table (the extra fields, in your case) you are also adding a performance penalty because the table gets bigger, which may mean more data travelling from server to clients, or to be paged in or out of memory... also if the field is there to speed up queries it will probably have one or more index on this, which again has a performance penalty during updates and inserts.The main point, though, is the one I hinted at in my comment: "cached" and "precomputed" values make the system more fragile in terms of data integrity.Are you sure that "event_creator_id" is always correctly pointing to the real creator, even in case someone has amended the original value? If yes, this has costs too, both in terms of computation (you have to update all the tables when the creator is changed) and in terms of actual development and testing effort (are you sure nobody forgot to propagate changes to the precomputed fields?).

Same goes for aggregate values like "discounted price" or running totals... and changing the original data is probably much more frequent than changing the "event creator" information. Again, is there a proper "cache invalidation" mechanism in place to ensure that the total sales are recomputed whenever someone completes a sale? What about returned item? Has anyone considered the cost of ensuring integrity?

Running totals and other derived values should be implemented by using views, instead, so that caching, if any, is performed by the actual DBMS engine, who knows how to properly take care of this.

Is there a rule of thumb, or a set of principles which I can use to say that - 'oh, it will be slower, but only by 1%, so its okay to do it this way', etc?

A DB (or arguably any kind of computing system) should be "correct first" so that you can find how to make it "fast enough, second".Trading correctness for speed is a decision that you should not take when designing the DB, unless you already know that the timeliness is considered way more important than correctness. I.e. your requirements clearly state that having a possibly wrong or outdated information is less important than response time.

In other words: designing a table with redundant cached info is another example of premature optimization and should be avoided at all costs.

CodeHunter

When to store precalculated values vs calculating them when retrieving them?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last