How could I save dataset from ipython notebook in Azure ML Studio? How could I save dataset from ipython notebook in Azure ML Studio? azure azure

How could I save dataset from ipython notebook in Azure ML Studio?


I have read the source code in the python package azureml, and found out that they are using a simple request post when uploading a dataset, which has a limited content length 4194304 bytes.

I tried to modify the code inside "http.py" within the python package azureml. I posted the request with a chunked data, and I got the following error:

Traceback (most recent call last):  File ".\azuremltest.py", line 10, in <module>    ws.datasets.add_from_dataframe(frame, 'GenericCSV', 'output2.csv', 'Uotput results')  File "C:\Python34\lib\site-packages\azureml\__init__.py", line 507, in add_from_dataframe    return self._upload(raw_data, data_type_id, name, description)  File "C:\Python34\lib\site-packages\azureml\__init__.py", line 550, in _uploadraw_data, None)  File "C:\Python34\lib\site-packages\azureml\http.py", line 135, in upload_dataset    upload_result = self._send_post_req(api_path, raw_data)  File "C:\Python34\lib\site-packages\azureml\http.py", line 197, in _send_post_req    raise AzureMLHttpError(response.text, response.status_code)azureml.errors.AzureMLHttpError: Chunked transfer encoding is not permitted. Upload size must be indicated in the Content-Length header.Request ID: 7b692d82-845c-4106-b8ec-896a91ecdf2d 2016-03-14 04:32:55Z

The REST API in azureml package does not support chunked transfer encoding. Hence, I took a look at how the Azure ML studio implements this, and I found out this:

  1. It post a request with content-length=0 to https://studioapi.azureml.net/api/resourceuploads/workspaces/<workspace_id>/?userStorage=true&dataTypeId=GenericCSV, which will return an id in the response body.

  2. Break the .csv file into chunks less than 4194304 bytes, and post them to https://studioapi.azureml.net/api/blobuploads/workspaces/<workspace_id>/?numberOfBlocks=<the number of chunks>&blockId=<index of chunk>&uploadId=<the id you get from previous request>&dataTypeId=GenericCSV

If you really want this functionality, you can implement it with python and the above REST API.

If you think it's too complicated, report the issue to this. The azureml python package is still under development, so your suggestion would be very helpful for them.


According to AzureML project on Github, the Workspace Object ws works via HTTP request for Azure Resource Management. SO your code is a resoure manager API request. The error AzureMLHttpError was caused by overing the limitation for Azure Resource Manager API request size. The maximum limit size is 4194304 bytes.

You can find it in the section Subscription limits - Azure Resource Manager of the doc Azure subscription and service limits, quotas, and constraints, please see the figure below.

enter image description here