Azure storage incremental copy by modified date
I am not sure if this would be the actual correct answer but I have resorted to this solution for now.
AzCopy is a bit faster but since it's executable I have no option to use it in Automation.
I wrote my own runbook (can be modified as workflow) which implements following AzCopy command
AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /Y
- Looking at List blobs we can only fiter blobs by blob prefix. So I cannot pull blobs filtered by Modified date. This leaves me to pull the whole blob list.
- I pull 20,000 blobs each time from source and destination Get-AzureStorageBlob with ContinuationToken
- Loop through pulled 20,000 source blobs and see if they do not exist in destination or have been modified in source
- If 2 is true then I write those blobs to the destination
- It takes around 3-4 hours to go through 7 million blobs. Task would prolong depending on how many blobs are to be written to the destination.
A code snippet
#loop throught the source container blobs, # and copy the blob to destination that are not already there $MaxReturn = 20000 $Total = 0 $Token = $null $FilesTransferred = 0; $FilesTransferSuccess = 0; $FilesTransferFail = 0; $sw = [Diagnostics.Stopwatch]::StartNew(); DO { $SrcBlobs = Get-AzureStorageBlob -Context $sourceContext -Container $container -MaxCount $MaxReturn -ContinuationToken $Token | Select-Object -Property Name, LastModified, ContinuationToken $DestBlobsHash = @{} Get-AzureStorageBlob -Context $destContext -Container $container -MaxCount $MaxReturn -ContinuationToken $Token | Select-Object -Property Name, LastModified, ContinuationToken | ForEach { $DestBlobsHash[$_.Name] = $_.LastModified.UtcDateTime } $Total += $SrcBlobs.Count if($SrcBlobs.Length -le 0) { Break; } $Token = $SrcBlobs[$SrcBlobs.Count -1].ContinuationToken; ForEach ($SrcBlob in $SrcBlobs){ # search in destination blobs for the source blob and unmodified, if found copy it $CopyThisBlob = $false if(!$DestBlobsHash.count -ne 0){ $CopyThisBlob = $true } elseif(!$DestBlobsHash.ContainsKey($SrcBlob.Name)){ $CopyThisBlob = $true } elseif($SrcBlob.LastModified.UtcDateTime -gt $DestBlobsHash.Item($SrcBlob.Name)){ $CopyThisBlob = $true } if($CopyThisBlob){ #Start copying the blobs to container $blobToCopy = $SrcBlob.Name "Copying blob: $blobToCopy to destination" $FilesTransferred++ try { $c = Start-AzureStorageBlobCopy -SrcContainer $container -SrcBlob $blobToCopy -DestContainer $container -DestBlob $blobToCopy -SrcContext $sourceContext -DestContext $destContext -Force $FilesTransferSuccess++ } catch { Write-Error "$blobToCopy transfer failed" $FilesTransferFail++ } } } } While ($Token -ne $Null) $sw.Stop() "Total blobs in container $container : $Total" "Total files transferred: $FilesTransferred" "Transfer successfully: $FilesTransferSuccess" "Transfer failed: $FilesTransferFail" "Elapsed time: $($sw.Elapsed) `n"
Last modified is stored in the iCloudBlob object, you can access it with Powershell, like this
$blob = Get-AzureStorageBlob -Context $Context -Container $container$blob[1].ICloudBlob.Properties.LastModified
Which will give you
DateTime : 31/03/2016 17:03:07
UtcDateTime : 31/03/2016 17:03:07
LocalDateTime : 31/03/2016 18:03:07
Date : 31/03/2016 00:00:00
Day : 31
DayOfWeek : Thursday
DayOfYear : 91
Hour : 17
Millisecond : 0
Minute : 3
Month : 3
Offset : 00:00:00
Second : 7
Ticks : 635950405870000000
UtcTicks : 635950405870000000
TimeOfDay : 17:03:07
Year : 2016
Having a read through the API I don't think it is possible to perform a search on the container with any parameters other than name. I can only imagine that the nodejs library still retrieves all blobs and then filters them.
I will dig into it a little bit more though