Which database model should I use for dynamic modification of entities/properties during runtime? Which database model should I use for dynamic modification of entities/properties during runtime? python python

Which database model should I use for dynamic modification of entities/properties during runtime?


The SQL or NoSQL choice is not your problem. You need to read little more about database design in general. As you said, you're not a database expert(and you don't need to be), butyou absolutely must study a little more the RDBMS paradigm.

It's a common mistake for amateur enthusiasts to choose a NoSQL solution. Sometimes NoSQLis a good solution, most of the times is not.

Take for example MongoDB, which you mentioned(and is one of the good NoSQL solutions I've tried). Schema-less, right? Err.. not exactly. You see when something is schema-less means no constraints, validation, etc. But your application's models/entities can't stand on thin air! Surely there will be some constraints and validation logic which you will implement on your software layer. So I give you mongokit! I will just quote from the project's description this tiny bit

MongoKit brings structured schema and validation layer on top of the great pymongo driver

Hmmm... unstructured became structured.

At least we don't have SQL right? Yeah, we don't. We have a different query language which is of course inferior to SQL. At least you don't need to resort to map/reduce for basic queries(see CouchDB).

Don't get me wrong, NoSQL(and especially MongoDB) has its purpose, but most of the times these technologies are used for the wrong reason.

Also, if you care about serious persistence and data integrity forget about NoSQL solutions. All these technologies are too experimental to keep your serious data. By researching a bit who(except Google/Amazon) uses NoSQL solutions and for what exactly, you will find that almost no one uses it for keeping their important data. They mostly use them for logging, messages and real time data. Basically anything to off-load some burden from their SQL db storage.

Redis, in my opinion, is probably the only project who is going to survive the NoSQL explosion unscathed. Maybe because it doesn't advertise itself as NoSQL, but as a key-value store, which is exactly what it is and a pretty damn good one! Also they seem serious about persistence. It is a swiss army knife, but not a good solution to replace entirely your RDBMS.

I am sorry, I said too much :)

So here is my suggestion:

1) Study the RDBMS model a bit.

2) Django is a good framework if most of your project is going to use an RDBMS.

3) Postgresql rocks! Also keep in mind that version 9.2 will bring native JSON support. You could dump all your 'dynamic' properties in there and you could use a secondary storage/engine to perform queries(map/reduce) on said properties. Have your cake and eat it too!

4) For serious search capabilities consider specialized engines like solr.

EDIT: 6 April 2013

5) django-ext-hstore gives you access to postgresql hstore type. It's similarto a python dictionary and you can perform queries on it, with the limitation thatyou can't have nested dictionaries as values. Also the value of key can be onlyof type string.

Have fun


Update in response to OP's comment

0) Consider the application 'contains data' and has already been used for a while

I am not sure if you mean that it contains data in a legacy dbms or you are justtrying to say that "imagine that the DB is not empty and consider the following points...".In the former case, it seems a migration issue(completely different question), in the latter, well OK.

1) Admin deletes entity "family" and all related data

Why should someone eliminate completely an entity(table)? Either your application has to do with families, houses, etc or it doesn't. Deleting instances(rows) of families is understandable of course.

2) Admin creates entity "house"

Same with #1. If you introduce a brand new entity in your app then most probably it will encapsulate semantics and business logic, for which new code must be written. This happens to all applications as they evolve through time and of course warrants a creation of a new table, or maybe ALTERing an existing one. But this process is not a part of the functionality of your application. i.e. it happens rarely, and is a migration/refactoring issue.

3) Admin adds properties "floors", "age", ..

Why? Don't we know beforehand that a House has floors? That a User has a gender?Adding and removing, dynamically, this type of attributes is not a feature, but a design flaw. It is part of the analysis/design phase to identify your entities and their respective properties.

4) Privileged user adds some houses.

Yes, he is adding an instance(row) to the existing entity(table) House.

5) User searches for all houses with at least five floors cheaper than 100 $

A perfectly valid query which can be achieved with either SQL or NoSQL solution.In django it would be something along those lines:

House.objects.filter(floors__gte=5, price__lt=100)

provided that House has the attributes floors and price. But if you need to do text-based queries, then neither SQL nor NoSQL will be satisfying enough. Because you don't want to implement faceting or stemming on your own! You will use some of the already discussed solutions(Solr, ElasticSearch, etc).

Some more general notes:

The examples you gave about Houses, Users and their properties, do not warrant any dynamic schema. Maybe you simplified your example just to make your point, but you talk about adding/removing Entities(tables) as if they are rows in a db. Entities are supposed to be a big deal in an application. They define the purpose of your application and its functionality. As such, they can't change every minute.

Also you said:

The idea is that all instances ("rows") of a certain entity ("table") share the same set of properties/attributes ("columns"). However, it will be perfectly valid if certain instances have an empty value for certain properties/attributes.

This seems like a common case where an attribute has null=True.

And as a final note, I would like to suggest to you to try both approaches(SQL and NoSQL), since it doesn't seem like your career depends on this project. It will be a beneficiary experience, as you will understand first-hand, the cons and pros of each approach. Or even how to "blend" these approaches together.


What you're asking about is a common requirement in many systems -- how to extend a core data model to handle user-defined data. That's a popular requirement for packaged software (where it is typically handled one way) and open-source software (where it is handled another way).

The earlier advice to learn more about RDBMS design generally can't hurt. What I will add to that is, don't fall into the trap of re-implementing a relational database in your own application-specific data model! I have seen this done many times, usually in packaged software. Not wanting to expose the core data model (or permission to alter it) to end users, the developer creates a generic data structure and an app interface that allows the end user to define entities, fields etc. but not using the RDBMS facilities. That's usually a mistake because it's hard to be nearly as thorough or bug-free as what a seasoned RDBMS can just do for you, and it can take a lot of time. It's tempting but IMHO not a good idea.

Assuming the data model changes are global (shared by all users once admin has made them), the way I would approach this problem would be to create an app interface to sit between the admin user and the RDBMS, and apply whatever rules you need to apply to the data model changes, but then pass the final changes to the RDBMS. So for example, you may have rules that say entity names need to follow a certain format, new entities are allowed to have foreign keys to existing tables but must always use the DELETE CASCADE rule, fields can only be of certain data types, all fields must have default values etc. You could have a very simple screen asking the user to provide entity name, field names & defaults etc. and then generate the SQL code (inclusive of all your rules) to make these changes to your database.

Some common rules & how you would address them would be things like:

-- if a field is not null and has a default value, and there are already existing records in the table before that field was added by the admin, update existing records to have the default value while creating the field (multiple steps -- add field allowing null; update all existing records; alter the table to enforce not null w/ default) -- otherwise you wouldn't be able to use a field-level integrity rule)

-- new tables must have a distinct naming pattern so you can continue to distinguish your core data model from the user-extended data model, i.e. core and user-defined have different RDBMS owners (dbo. vs. user.) or prefixes (none for core, __ for user-defined) or somesuch.

-- it is OK to add fields to tables that are in the core data model (as long as they tolerate nulls or have a default), and it is OK for admin to delete fields that admin added to core data model tables, but admin cannot delete fields that were defined as part of the core data model.

In other words -- use the power of the RDBMS to define the tables and manage the data, but in order to ensure whatever conventions or rules you need will always be applied, do this by building an app-to-DB admin function, instead of giving the admin user direct DB access.

If you really wanted to do this via the DB layer only, you could probably achieve the same by creating a bunch of stored procedures and triggers that would implement the same logic (and who knows, maybe you would do that anyway for your app). That's probably more of a question of how comfortable are your admin users working in the DB tier vs. via an intermediary app.


So to answer your questions directly:

(1) Yes, add tables and columns at run time, but think about the rules you will need to have to ensure your app can work even once user-defined data is added, and choose a way to enforce those rules (via app or via DB / stored procs or whatever) when you process the table & field changes.

(2) This issue isn't strongly affected by your choice of SQL vs. NoSQL engine. In every case, you have a core data model and an extended data model. If you can design your app to respond to a dynamic data model (e.g. add new fields to screens when fields are added to a DB table or whatever) then your app will respond nicely to changes in both the core and user-defined data model. That's an interesting challenge but not much affected by choice of DB implementation style.

Good luck!


Maybe doesn't matter the persistence engine of your model objects (RDBMS, NoSQL, etc...). The technology you're looking for is an index to search for and find your objects.

I think you need to find your objects using their schema. So, if the schema is defined dynamically and persisted on a database you can build dynamic search forms, etc.. Some kind of reference of the entity and attributes to the real objects is needed.

Give a look to the Entity-Attribute-Model pattern (EAV). This can be implemented over SQLAlchemy to use an RDBMS database as mean to store vertically your schema and data and relate thems.

You're entering in the field of the Semantic Web Programming, maybe you should read at less the first chapter of this book:

Programming The Semantic Web

it tells the whole story of your problem: from rigid schemas to dynamic schemas, implemented first as a key-value store and later improved to a graph persistence over a relational model.

My opinion is that the best implementations of this could be achieved nowadays with graph databases and a very good example of current implementations are Berkeley DBs (some LDAP implementations use Berkeley DBs as a tech implementation to this indexing problem.)

Once in a graph model you could do some kind of "inferences" on the graph, making appear the DB with somekind of "intelligence". An example of this is stated on the book.