Copy millions of files form root AZStorage Blob to subfolders Copy millions of files form root AZStorage Blob to subfolders azure azure

Copy millions of files form root AZStorage Blob to subfolders


Read doc: Move data to and from Azure Blob storage

The following articles describe how to move data to and from Azure Blob storage using different technologies.


In your case, I would suggest you to use SDK, which supports .NET, Java, Node.js, Python, Go, PHP, Ruby.

Believe me , if you want to migrate your datas from AzureBlob , DataFactory is not a good way, it makes the problem more complicated.( This is my suggestion after I migrated over 100 million JSON-files (over 2TB) from AzureBlob)


If you have time... I would do the following:

Create an Azure Function to read the file and get your timestamp and do your move operation. scope the function just to use a single file. Then use events (EventGrid) in the storage account to trigger the function on create of a blob. Then you know for any new files it will move the file to the right spot. (Remember you need to reach a million executions in the consumption model for functions to start billing, so this is a low cost option).

For the current files, create another function (or if you want some more control, use a logic app, but your cost will be a bit more) and set your parralelism on the function or logic app to a low amount (to keep an eye on your executions). that run a simple for each with limits that run your first function. This will slowly move your files out of that container eventually getting you into a reasonable item count to work with on with stuff like ADF. This might just solve your problem for the long run as any new files will be categorized accordingly, and your backlog is slowly being moved as required. If you need to update a DB with a pointer to where your file lives you could put that piece of code also in your function or logic app. Just my two cents :)


It is not clear if you are using the hierarchical folder structure provided by Azure Data Lake Storage Gen2, the generation 1 simulates a folders structure but it is not optimum.

There are several advantages on the ADLSV2 that should help in your case mainly related to move operations.

To migrate from ADLS Gen 1 to ADLS Gen 2 have a look here.

Additionally, you may explore optimizations on your specific case with the following paper here.