Azure storage incremental copy by modified date Azure storage incremental copy by modified date powershell powershell

Azure storage incremental copy by modified date


I am not sure if this would be the actual correct answer but I have resorted to this solution for now.

AzCopy is a bit faster but since it's executable I have no option to use it in Automation.

I wrote my own runbook (can be modified as workflow) which implements following AzCopy command

AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /Y

  1. Looking at List blobs we can only fiter blobs by blob prefix. So I cannot pull blobs filtered by Modified date. This leaves me to pull the whole blob list.
  2. I pull 20,000 blobs each time from source and destination Get-AzureStorageBlob with ContinuationToken
  3. Loop through pulled 20,000 source blobs and see if they do not exist in destination or have been modified in source
  4. If 2 is true then I write those blobs to the destination
  5. It takes around 3-4 hours to go through 7 million blobs. Task would prolong depending on how many blobs are to be written to the destination.

A code snippet

    #loop throught the source container blobs,     # and copy the blob to destination that are not already there    $MaxReturn = 20000    $Total = 0    $Token = $null    $FilesTransferred = 0;    $FilesTransferSuccess = 0;    $FilesTransferFail = 0;    $sw = [Diagnostics.Stopwatch]::StartNew();    DO    {        $SrcBlobs = Get-AzureStorageBlob -Context $sourceContext -Container $container -MaxCount $MaxReturn  -ContinuationToken $Token |             Select-Object -Property Name, LastModified, ContinuationToken        $DestBlobsHash = @{}        Get-AzureStorageBlob -Context $destContext -Container $container -MaxCount $MaxReturn  -ContinuationToken $Token  |             Select-Object -Property Name, LastModified, ContinuationToken  |                 ForEach { $DestBlobsHash[$_.Name] = $_.LastModified.UtcDateTime }        $Total += $SrcBlobs.Count        if($SrcBlobs.Length -le 0) {             Break;        }        $Token = $SrcBlobs[$SrcBlobs.Count -1].ContinuationToken;        ForEach ($SrcBlob in $SrcBlobs){            # search  in destination blobs for the source blob and unmodified, if found copy it            $CopyThisBlob = $false            if(!$DestBlobsHash.count -ne 0){                $CopyThisBlob = $true            } elseif(!$DestBlobsHash.ContainsKey($SrcBlob.Name)){                $CopyThisBlob = $true            } elseif($SrcBlob.LastModified.UtcDateTime -gt $DestBlobsHash.Item($SrcBlob.Name)){                $CopyThisBlob = $true            }            if($CopyThisBlob){                #Start copying the blobs to container                $blobToCopy = $SrcBlob.Name                "Copying blob: $blobToCopy to destination"                $FilesTransferred++                try {                    $c = Start-AzureStorageBlobCopy -SrcContainer $container -SrcBlob $blobToCopy  -DestContainer $container -DestBlob $blobToCopy -SrcContext $sourceContext -DestContext $destContext -Force                    $FilesTransferSuccess++                } catch {                    Write-Error "$blobToCopy transfer failed"                    $FilesTransferFail++                }               }                   }    }    While ($Token -ne $Null)    $sw.Stop()    "Total blobs in container $container : $Total"    "Total files transferred: $FilesTransferred"    "Transfer successfully: $FilesTransferSuccess"    "Transfer failed: $FilesTransferFail"    "Elapsed time: $($sw.Elapsed) `n"


Last modified is stored in the iCloudBlob object, you can access it with Powershell, like this

$blob = Get-AzureStorageBlob -Context $Context  -Container $container$blob[1].ICloudBlob.Properties.LastModified

Which will give you

DateTime : 31/03/2016 17:03:07
UtcDateTime : 31/03/2016 17:03:07
LocalDateTime : 31/03/2016 18:03:07
Date : 31/03/2016 00:00:00
Day : 31
DayOfWeek : Thursday
DayOfYear : 91
Hour : 17
Millisecond : 0
Minute : 3
Month : 3
Offset : 00:00:00
Second : 7
Ticks : 635950405870000000
UtcTicks : 635950405870000000
TimeOfDay : 17:03:07
Year : 2016

Having a read through the API I don't think it is possible to perform a search on the container with any parameters other than name. I can only imagine that the nodejs library still retrieves all blobs and then filters them.

I will dig into it a little bit more though