Data object storage - Can table JOIN's do what single table SELECT's cannot? Data object storage - Can table JOIN's do what single table SELECT's cannot? mongodb mongodb

Data object storage - Can table JOIN's do what single table SELECT's cannot?


Just because you can, doesn't mean you should.

The multiple SELECT statement alternative cons:

  • the less trips to the database, the better. TCP overhead can not be recouped, and it looks like Network Neutrality is officially dead so we could expect to see a movement away from multi-select/nosql because you might have to pay for that bandwidth...
  • because of delay between initial and subsequent statements, risk of supporting data not reflecting what's in the system when the first query was run
  • less scalable--the larger the data set, the more work the application is doing to deal with business rules and association that can scale far better in a database
  • more complexity in the application, which also makes the business less portable (IE: migrate from Java to .NET or vice versa - you're looking at building from scratch when business logic in the DB would minimize that)


1 - running multiple separated queries leaves you with consurrency mess - by the time you got something from table 1 it could have been deleted and it might still be in table 2 - now assume 5 correlated tables.

2 - running queries with at least moderately complex logic over fields that are not mythical ID

3 - controling the amount of data fetched (you hardly ever need more than 50% of the data which is needed to deserialize/create valid objects and even worse whole trees of connected objects)

4 - correlated queries (nested selects) which SQL server will optimize like joins to additive complexity or better (|T1|+|T2|+|T3|+|T4|) while any ORM or nonSQL will have to keep repeating inner queries and giving rise to multiplicative complexity (|T1||T2||T3|*|T4|)

5 - dataset sizes, scalability not just in dataset sizes but also in handling concurrency under updates. Even ORM-s which maintain transactions make them so long that chances for deadlocks increase exponentially.

6 - blind updates (a lot more data touched for no reason) and their dependency and failure based on a blind instrument (mythical version which is realistically needed in say 1% of relational data model but ORM and alikes have to have it everywhere)

7 - lack of any standards and compatibility - this means that your system and data will always be at much higher risk and dependent on software changes driven by academic adventurism rather that any actual business responsibility and with expectation to invest a lot of resources just in testing changes.

8 - data integrity - oops some code just deleted half of today's order records from T1 since there was no foreign key to T2 to stop it. Prefecly normal thing to do with separated queries.

9 - negative maturity trend - keeps splintering instead of standardizing - give it 20 yr and maybe it will get stable

Last but not least - it doesn't reduce any compexity (the same correlation between data is still there) but it makes it very hard to track and manage complexity or have any realistic remedy or transparency when something goes wrong. And it adds the complexity of 1-2 layers. If something goes wrong in your SQL tables you have tools and queries to discover and even fix your data. What are you going to do when some ORM just tells you that it has "invalid pointer" and throws exception since surely you don't want "invalid object" ?

I think that's enough :-)


Actually one of the biggest problems is that some of the NoSQL databases are not transactional across multiple queries. ORM like Hibernate will do multiple queries with out "joining" sometimes but have the advantage that they are with in the same transaction.

With NoSQL you do not have that luxury.So this could very easily have misleading results:

SELECT * FROM user WHERE id = 4SELECT * FROM company WHERE id = {user.comany_id}

If the company for user.company_id is deleted between the two statement calls. This is a well known issue with these databases. So regardless of whether or not you can properly do JOINs the issue will be not having transactions.

Otherwise you can model anything so long as it can store bytes :)