How to speed up a Cosmos DB aggregate query? How to speed up a Cosmos DB aggregate query? azure azure

How to speed up a Cosmos DB aggregate query?


The Cosmos DB team has now made some significant changes to aggregation performance and how indexes are used. This is their indexing "v2" strategy and was only recently rolled out (it may not be available to all accounts yet, contact MSFT if you have an older db that needs upgrading).

You can compare the new results to the picture I originally posted.

You'll note now that Document load time shows as 0ms and the retrieved document size is 0 bytes. The load time I can confirm is really quite fast now so it is possible it is under 1ms when measured from the server side. And document size of 0 makes more sense since no documents need to be retrieved for this (only count based on the index).

Finally you can see that the RUs dropped from 3222 to 7.4 !!!! A pretty drastic difference.

Summing on multiple columns at once within a single partition is also quite performant now and we can do about 8 sums at once across 2 million documents with ~50 RUs and it takes about 20-70ms when measured from a function API endpoint (so includes network time).

More work still needs to be done by Cosmos DB team to allow for cross partition multi-column aggregations, but the improvements we have now are quite promising.

enter image description here


For the specific query shown, there is no need to specify table name, and you could try to limit 1, some performance will be improved. For example:

SELECT COUNT(1) FROM c WHERE country_code="FR" AND calculated.flag=1 LIMIT 1

Also, do not forget to carefully analyse your query execution, I am not sure in Cosmos, but like PostreSQL approach, EXPLAIN ANALYSE. Be also sure you are using the best type of variables, for example, varchar(2) instead of varchar(3). I would recommend to change character types of the countries per numbers, if you are filtering them (as you point out). For example, FR=1, GR=2 and so on. This will also improve performance. Finally, if country code and calculated flag are related, create a unique variable defining them. If nothing of these work, check for client performance, and even hardware.


Two ideas:

Try running the following, see if you get different run times:

SELECT COUNT(1) FROM c WHERE country_code="FR"

Important! The calculated.flag1 field, if it's not persistent, could give out the issue - as for each document/record - the DB engine has to calculate the result, hence the high RU.Can you optimize the calculated fields? (break them down, or do the calculation as part of the query?)

2nd suggestion would be to try and make you have defined a composite index

{          "automatic":true,        "indexingMode":"Consistent",        "includedPaths":[              {                  "path":"/*"            }        ],        "excludedPaths":[          ],        "compositeIndexes":[              [                  {                      "path":"/country_code",                    "order":"ascending"                },                {                      "path":"/calculated",                    "order":"descending"                }            ]        ]    }

Please also see Composite indexing policy examples

And Manage indexing policies in Azure Cosmos DB to see where you edit it