Azure CloudAppendBlob errors with concurrent access

After a bit more searching it looks like this is an actual problem.

I guess AppendBlobStorage is fairly new. (There are also other issues at the moment with AppendBlobStorage. see

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/09/02/issue-in-azure-storage-client-library-5-0-0-and-5-0-1-preview-in-appendblob-functionality.aspx)

Anyway I fixed the issue by using the AppendBlock varient rather than AppendText as suggested here:

https://azurekan.wordpress.com/2015/09/08/issues-with-adding-text-to-azure-storage-append-blob/

The change to the appendtext method which passes the unit test defined above

    public void AppendText(string filename, string text)    {        if (string.IsNullOrWhiteSpace(filename))            throw new ArgumentException("filename cannot be null or empty");        if (!string.IsNullOrEmpty(text))        {            CloudAppendBlob cab = m_BlobStorage.BlobContainer.GetAppendBlobReference(filename);            // Create if it doesn't exist            if (!cab.Exists())            {                try                {                    cab.CreateOrReplace(AccessCondition.GenerateIfNotExistsCondition(), null, null);                }                catch (StorageException) { }            }            // use append block as append text seems to have an error at the moment.            using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(text)))            {                cab.AppendBlock(ms);            }        }    }

c# azure concurrency blobstorage

the class CloudAppendBlob's append methods, includes

AppendBlock/AppendFromByteArray/AppendFromFile/AppendFromStream/AppendText

essentially they will all use this same rest api endpoint. read the document:https://docs.microsoft.com/en-us/rest/api/storageservices/append-block

But only AppendBlock should be used in multi-writer scenario, all others should be used in single-writer scenario. The reason is: AppendBlock will NOT send the header x-ms-blob-append-offset with the PUT HTTP request.

the header x-ms-blob-append-offset basically saying, MUST append this block data at this offset of the blob.

so for AppendBlock the http request looks like this:

PUT https://test.blob.core.windows.net/test/20180323.log?comp=appendblock HTTP/1.1User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0)x-ms-version: 2017-07-29x-ms-client-request-id: bb7f5a93-191d-40f9-8b92-4ec0476be920x-ms-date: Fri, 23 Mar 2018 20:21:29 GMTAuthorization: SharedKey XXXXXHost: test.blob.core.windows.netContent-Length: 99

For all the other append methods, it will send the header x-ms-blob-append-offset. The value of this header should be the current length of the blob before append. so how does the library know the value? It actually will send a HEAD http request to get that information

HEAD http://test.blob.core.windows.net/test/20180323.log HTTP/1.1User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0)x-ms-version: 2017-07-29x-ms-client-request-id: 1cdb3731-9d72-41ab-afee-d4f462e9b0c2x-ms-date: Fri, 23 Mar 2018 20:29:19 GMTAuthorization: SharedKey XXXXHost: test.blob.core.windows.net

the response header Content-Length's value will be the value for the header x-ms-blob-append-offset in the following PUT http request:

PUT http://test.blob.core.windows.net/test/20180323.log?comp=appendblock HTTP/1.1User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0)x-ms-version: 2017-07-29x-ms-blob-condition-appendpos: 1287x-ms-client-request-id: 1cdb3731-9d72-41ab-afee-d4f462e9b0c2x-ms-date: Fri, 23 Mar 2018 20:29:20 GMTAuthorization: SharedKey XXXXXHost: test.blob.core.windows.netContent-Length: 99

so the original question, when two parallel tasks call the AppendText at the same time, most likely, the two tasks will send the HEAD http request to get the blob's current length, which will be the same. Then task that send the PUT http request first will succeed, but the task that send the PUT http request later will fail because the blob's length already changed, and that offset has been already taken by the first PUT http request.

So if you have a multi-writer scenario, AppendBlock is the method that works right now. But you do have to be aware that

you will have no control of the position of the block in blob
the blob block has a size limit ( i think it is 4M)
if you use AppendBlock to upload the data more than 4M, the request will fail, with a response: HTTP/1.1 413 The request body is too large and exceeds the maximum permissible limit
if you use other methods except AppendBlock to upload a large data, it will send one HEAD http request to get the blob length, then automatically split the data into multiple PUT http requests. the block size can be controlled by CloudAppendBlob.StreamWriteSizeInBytes. if you don't set, it will default to 4M.
So as the name AppendBlock hints, it can only append one block, not more than one block. So if you want to upload a large blob, you have split the data yourself. But if you have a multi-writer scenario, you can not guarantee the splitted blocks will be together in the blob.

c# azure concurrency blobstorage

For people who need a more generic solution to this problem, I created an extension method:

public static async Task AppendTextConcurrentAsync(this CloudAppendBlob appendBlob, string content){    using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(content)))    {        await appendBlob.AppendBlockAsync(stream);    }}

This solution is more consistent with how you use other Append* methods on CloudAppendBlob.

CodeHunter

Azure CloudAppendBlob errors with concurrent access

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last