How can I shake relational database thinking for designing an azure table storage datastore? How can I shake relational database thinking for designing an azure table storage datastore? azure azure

How can I shake relational database thinking for designing an azure table storage datastore?


You can think of Azure tables as collections of objects.

In Azure Table parlance, an object is an entity.

To use your example, users would derive from TableStorageEntity.

Azure Table Storage is not relational. There are no joins. But there is LINQ, a query language supported in various languages. So join operations and referential integrity is not provided by the system. The developer must do this.

Some significant advantages:

(1) Azure tables automatically scale across multiple storage nodes to maintain performance, even if you are dealing with billions of entities. (2) They are replicated 3 times(3) They come with an SLA(4) The Table service API is compliant with the REST API, so they can be accessed from non-Microsoft technologies.

To allow your objects to be stored in Azure tables, you simply need to derive from TableStorageEntity.

More information can be found if you search for "Microsoft Azure tables virtual labs".

The snippet below ignores (1) partition key (2) rowkey. But this is something you need to worry about. Think of the two keys as being the primary key on a relational table.

You need to think of these two key very carefully. They determine performance. Because you only get one set of keys, you may need to keep de-normalized copies of data for best performance.

    public class Users : TableStorageEntity    {        public int ID { get; set; }        public String Username { get; set; }    }

Check out the hands on. Azure tables are cheap and easy to use.


Some thoughts ...

  1. Think of distinct entities in their entirety, and abstain from decomposing them further using any normalization techniques.
  2. Come up with a single identifier for each entity, which if indexed on, would allow both an exact key search, as well as a range key search to match.
  3. Split the identifier into 2 segments for Azure table storage scalability needs. How to split well, is a separate topic on its own but usually splitting across well-defined natural segments works sufficient well.

In your example, the two entities would be User and Note.

A UserId would be sufficient to uniquely identify a User. A range search on a user may not be really useful. The user id could be any fixed length value here.

A UserId + NoteId would be sufficient to uniquely identify a note. The note id could be something like a date/timestamp + a GUID for uniqueness. Such a key, in combination with the UserId would uniquely identify the note as well as allow a range search on all the user’s notes or a user’s notes over a given time period.

So if UserId = “ABCD”, NoteId could be “20120801- 00f64829-6044-4fbb-8b4e-ae82ae15096e”.

You could store both entities in the same or in different tables. Here are some different approaches …

If each entity has its own table,

  • For a User Partition Key could be “ABCD” and Row Key could beactually anything and you search only on the partition key.

  • Or Partition Key could be “AB” and Row Key could be “CD”.

    Both the above would scale well for large number of users.

  • Or Partition Key could be “*” and Row Key could be “ABCD”. This would work quite well for a smaller set of users and you could put both users and notes in the same table.

For a Note

  • Partition Key could be “ABCD” and Row Key could be “20120801- 00f64829-6044-4fbb-8b4e-ae82ae15096e”

  • Range search here could be

    • On PartitionKey = “ABCD” to get all the notes for a user.
    • On PartitionKey = “ABCD” and RowKey >=“20120801” and RowKey <= “20120901” to get notes within a date range.

UPDATE

I misread your question and assumed only a one to many relationship between users and notes. Since there is a many to many relationship would need 4 entities to model, unless you do not mind duplication. (If notes are short and immutable, they can be duplicated and you would not have to model joins).

More than 1 entity can be placed in a single table if the keys are in different key ranges and can be easily distinguished. Although, in practice this is not common unless there is a specific need, usually transactional writes to the same partition (not applicable there).

So a single table schema could look like this. For multiple tables, the Partition Key prefix could be dropped.

  • You could also model this in 3 tables, one for User, one for Notes and one for the relationships in both directions.
  • You could also model this partially in SQL and partially in Azure storage. Notes and User data in blobs or tables and relationships in SQL.

.

Entity        Partition Key              Row Key            User          “U” + UserId      Note          “N” + NoteId(Date)         NodeId(GUID)User Note     “X“  + UserId              NoteId(Date+GUID)Note User     “Y“  + NoteId(Date+GUID)   UserId    

These are some alternatives and you will want to determine which fits your data and your needs best.

YET ANOTHER UPDATE
Actually 3 entities should be enough with the Note in the UserNote entity.

If UserId = GUID
And NoteId = Date+GUID

Entity      Partition Key  Row Key             Note  User           User        UserId      User Note   UserId         NoteId(Date+GUID)   Note          (Contains Note and can query for all notes for a user).Note User   NoteId(Date)   NodeId(GUID)              UserId  (Can query for all Users of a note. Join on ‘User Note’ to get note.)  


Why does UsersNotes have an ID? Why not just UserID, NoteID as a composite primary key?

So three tables with 2 properties each. The first is the PartitionKey and the second is the RowKey.

If you expect to query on NoteID to get UserIDs a lot then a 4th table as search on PartitionKey is faster than search on RowKey. And it will usually be cheaper as it results in less transactions. But you have the transactions to load the table.

public class NotesUsers : TableStorageEntity    {        public int NoteID { get; set; }        public int UserID { get; set; }    }

And for the Users table go with UserName as the PartitionKey if that is the common query condition.

The is no declarative referential integrity in ATS. You will need to enforce all data relations in your application. Two part composite key. A search on the RowKey is like a scan (not a seek). Where a search on the PartitionKey is like a seek.

But I would go SQL. If notes is someone typing then that is a relative low volume of data. And it is relational data.