Azure Cosmos DB - Understanding Partition Key Azure Cosmos DB - Understanding Partition Key azure azure

Azure Cosmos DB - Understanding Partition Key


Honestly the video here* was a MAJOR help to understanding partitioning in CosmosDb.

But, in a nutshell:The PartitionKey is a property that will exist on every single object that is best used to group similar objects together.

Good examples include Location (like City), Customer Id, Team, and more. Naturally, it wildly depends on your solution; so perhaps if you were to post what your object looks like we could recommend a good partition key.

EDIT: Should be noted that PartitionKey isn't required for collections under 10GB. (thanks David Makogon)


* The video used to live on this MS docs page entitled, "Partitioning and horizontal scaling in Azure Cosmos DB", but has since been removed. A direct link has been provided, above.


Partition key acts as a logical partition.

Now, what is a logical partition, you may ask? A logical partition may vary upon your requirements; suppose you have data that can be categorized on the basis of your customers, for this customer "Id" will act as a logical partition and info for the users will be placed according to their customer Id.

What effect does this have on the query?

While querying you would put your partition key as feed options and won't include it in your filter.

e.g: If your query was

SELECT * FROM T WHERE T.CustomerId= 'CustomerId';

It will be Now

var options = new FeedOptions{ PartitionKey = new PartitionKey(CustomerId)};var query = _client.CreateDocumentQuery(CollectionUri,$"SELECT * FROM T",options).AsDocumentQuery(); 


CosmosDB can be used to store any limit of data. How it does in the back end is using partition key. Is it the same as Primary key? - NO

Primary Key: Uniquely identifies the dataPartition key helps in sharding of data(For example one partition for city New York when city is a partition key).

Partitions have a limit of 10GB and the better we spread the data across partitions, the more we can use it. Though it will eventually need more connections to get data from all partitions. Example: Getting data from same partition in a query will be always faster then getting data from multiple partitions.