How to speed up a Cosmos DB aggregate query?

azure nosql azure-cosmosdb database-performance

The Cosmos DB team has now made some significant changes to aggregation performance and how indexes are used. This is their indexing "v2" strategy and was only recently rolled out (it may not be available to all accounts yet, contact MSFT if you have an older db that needs upgrading).

You can compare the new results to the picture I originally posted.

You'll note now that Document load time shows as 0ms and the retrieved document size is 0 bytes. The load time I can confirm is really quite fast now so it is possible it is under 1ms when measured from the server side. And document size of 0 makes more sense since no documents need to be retrieved for this (only count based on the index).

Finally you can see that the RUs dropped from 3222 to 7.4 !!!! A pretty drastic difference.

Summing on multiple columns at once within a single partition is also quite performant now and we can do about 8 sums at once across 2 million documents with ~50 RUs and it takes about 20-70ms when measured from a function API endpoint (so includes network time).

More work still needs to be done by Cosmos DB team to allow for cross partition multi-column aggregations, but the improvements we have now are quite promising.

azure nosql azure-cosmosdb database-performance

For the specific query shown, there is no need to specify table name, and you could try to limit 1, some performance will be improved. For example:

SELECT COUNT(1) FROM c WHERE country_code="FR" AND calculated.flag=1 LIMIT 1

Also, do not forget to carefully analyse your query execution, I am not sure in Cosmos, but like PostreSQL approach, EXPLAIN ANALYSE. Be also sure you are using the best type of variables, for example, varchar(2) instead of varchar(3). I would recommend to change character types of the countries per numbers, if you are filtering them (as you point out). For example, FR=1, GR=2 and so on. This will also improve performance. Finally, if country code and calculated flag are related, create a unique variable defining them. If nothing of these work, check for client performance, and even hardware.

azure nosql azure-cosmosdb database-performance

Two ideas:

Try running the following, see if you get different run times:

SELECT COUNT(1) FROM c WHERE country_code="FR"

Important! The calculated.flag1 field, if it's not persistent, could give out the issue - as for each document/record - the DB engine has to calculate the result, hence the high RU.Can you optimize the calculated fields? (break them down, or do the calculation as part of the query?)

2nd suggestion would be to try and make you have defined a composite index

{          "automatic":true,        "indexingMode":"Consistent",        "includedPaths":[              {                  "path":"/*"            }        ],        "excludedPaths":[          ],        "compositeIndexes":[              [                  {                      "path":"/country_code",                    "order":"ascending"                },                {                      "path":"/calculated",                    "order":"descending"                }            ]        ]    }

Please also see Composite indexing policy examples

And Manage indexing policies in Azure Cosmos DB to see where you edit it

CodeHunter

How to speed up a Cosmos DB aggregate query?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last