How can I shake relational database thinking for designing an azure table storage datastore?

c# azure azure-storage azure-table-storage

You can think of Azure tables as collections of objects.

In Azure Table parlance, an object is an entity.

To use your example, users would derive from TableStorageEntity.

Azure Table Storage is not relational. There are no joins. But there is LINQ, a query language supported in various languages. So join operations and referential integrity is not provided by the system. The developer must do this.

Some significant advantages:

(1) Azure tables automatically scale across multiple storage nodes to maintain performance, even if you are dealing with billions of entities. (2) They are replicated 3 times(3) They come with an SLA(4) The Table service API is compliant with the REST API, so they can be accessed from non-Microsoft technologies.

To allow your objects to be stored in Azure tables, you simply need to derive from TableStorageEntity.

More information can be found if you search for "Microsoft Azure tables virtual labs".

The snippet below ignores (1) partition key (2) rowkey. But this is something you need to worry about. Think of the two keys as being the primary key on a relational table.

You need to think of these two key very carefully. They determine performance. Because you only get one set of keys, you may need to keep de-normalized copies of data for best performance.

    public class Users : TableStorageEntity    {        public int ID { get; set; }        public String Username { get; set; }    }

Check out the hands on. Azure tables are cheap and easy to use.

c# azure azure-storage azure-table-storage

Some thoughts ...

Think of distinct entities in their entirety, and abstain from decomposing them further using any normalization techniques.
Come up with a single identifier for each entity, which if indexed on, would allow both an exact key search, as well as a range key search to match.
Split the identifier into 2 segments for Azure table storage scalability needs. How to split well, is a separate topic on its own but usually splitting across well-defined natural segments works sufficient well.

In your example, the two entities would be User and Note.

A UserId would be sufficient to uniquely identify a User. A range search on a user may not be really useful. The user id could be any fixed length value here.

A UserId + NoteId would be sufficient to uniquely identify a note. The note id could be something like a date/timestamp + a GUID for uniqueness. Such a key, in combination with the UserId would uniquely identify the note as well as allow a range search on all the user’s notes or a user’s notes over a given time period.

So if UserId = “ABCD”, NoteId could be “20120801- 00f64829-6044-4fbb-8b4e-ae82ae15096e”.

You could store both entities in the same or in different tables. Here are some different approaches …

If each entity has its own table,

For a User Partition Key could be “ABCD” and Row Key could beactually anything and you search only on the partition key.
Or Partition Key could be “AB” and Row Key could be “CD”.
Both the above would scale well for large number of users.
Or Partition Key could be “*” and Row Key could be “ABCD”. This would work quite well for a smaller set of users and you could put both users and notes in the same table.

For a Note

Partition Key could be “ABCD” and Row Key could be “20120801- 00f64829-6044-4fbb-8b4e-ae82ae15096e”
Range search here could be
- On PartitionKey = “ABCD” to get all the notes for a user.
- On PartitionKey = “ABCD” and RowKey >=“20120801” and RowKey <= “20120901” to get notes within a date range.

UPDATE

I misread your question and assumed only a one to many relationship between users and notes. Since there is a many to many relationship would need 4 entities to model, unless you do not mind duplication. (If notes are short and immutable, they can be duplicated and you would not have to model joins).

More than 1 entity can be placed in a single table if the keys are in different key ranges and can be easily distinguished. Although, in practice this is not common unless there is a specific need, usually transactional writes to the same partition (not applicable there).

So a single table schema could look like this. For multiple tables, the Partition Key prefix could be dropped.

You could also model this in 3 tables, one for User, one for Notes and one for the relationships in both directions.
You could also model this partially in SQL and partially in Azure storage. Notes and User data in blobs or tables and relationships in SQL.

Entity        Partition Key              Row Key            User          “U” + UserId      Note          “N” + NoteId(Date)         NodeId(GUID)User Note     “X“  + UserId              NoteId(Date+GUID)Note User     “Y“  + NoteId(Date+GUID)   UserId

These are some alternatives and you will want to determine which fits your data and your needs best.

YET ANOTHER UPDATE
Actually 3 entities should be enough with the Note in the UserNote entity.

If UserId = GUID
And NoteId = Date+GUID

Entity      Partition Key  Row Key             Note  User           User        UserId      User Note   UserId         NoteId(Date+GUID)   Note          (Contains Note and can query for all notes for a user).Note User   NoteId(Date)   NodeId(GUID)              UserId  (Can query for all Users of a note. Join on ‘User Note’ to get note.)

c# azure azure-storage azure-table-storage

Why does UsersNotes have an ID? Why not just UserID, NoteID as a composite primary key?

So three tables with 2 properties each. The first is the PartitionKey and the second is the RowKey.

If you expect to query on NoteID to get UserIDs a lot then a 4th table as search on PartitionKey is faster than search on RowKey. And it will usually be cheaper as it results in less transactions. But you have the transactions to load the table.

public class NotesUsers : TableStorageEntity    {        public int NoteID { get; set; }        public int UserID { get; set; }    }

And for the Users table go with UserName as the PartitionKey if that is the common query condition.

The is no declarative referential integrity in ATS. You will need to enforce all data relations in your application. Two part composite key. A search on the RowKey is like a scan (not a seek). Where a search on the PartitionKey is like a seek.

But I would go SQL. If notes is someone typing then that is a relative low volume of data. And it is relational data.

CodeHunter

How can I shake relational database thinking for designing an azure table storage datastore?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last