DISTINCT with PARTITION BY vs. GROUPBY
Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for
PARTITION BY was significantly slower.
GROUP BY query plan included only a table scan and aggregation operation while the
PARTITION BY plan had two nested loop self-joins. The
PARTITION BY took about 2800ms on the second run, the
GROUP BY took only 500ms.
Readability / Maintainability:
Based on the opinions of the commenters here the
PARTITION BY is less readable for most developers so it will be probably also harder to maintain in the future.
PARTITION BY gives you more flexibility in choosing the grouping columns. With
GROUP BY you can have only one set of grouping columns for all aggregated columns. With
DISTINCT + PARTITION BY you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the
sum() as an analytic function with
over partition by is not necessary. I don't think there is a big difference between them in any sense. In oracle there are lot more analytic function than aggregation function. I think ms-sql is the same case. And for example
dense rank(), etc are much harder to implement with only
group by.Of course this argument is not really for defending the first version...
Maybe there were previously more computed fields in the result set which are not implementable with group by.
Although both queries seem to compute the same thing when you look at the columns, they are actually producing completely different set of rows.
The first one using the analytical function will output exactly one row for each input row. That is for EACH stock information, it will return a row with the total quantity for the associated company/warehouse/item. (by the way computing the average would make more sense to me but who knows...)
The second one will only return a single row for each company/warehouse/item combinaison.
So yes, in that example the first query seems a bit useless... unless you want to compute some stock level statistic like the current stock ratio over the overall quantity by company/warehouse/item (just an example, don't know if it has any business meaning!)
Analytical function are very powerful mechanism in SQL, in some sense way more powerful than a group-by. But use it with care... A simple rule of thumb could be: if you can compute it using a group-by, well, don't use an analytical function ;)