Python: How to read and load an excel file from AWS S3? Python: How to read and load an excel file from AWS S3? python-3.x python-3.x

Python: How to read and load an excel file from AWS S3?


Spent quite some time on it and here's how I got it working,

import boto3import ioimport pandas as pdimport jsonaws_id = ''aws_secret = ''bucket_name = ''object_key = ''s3 = boto3.client('s3', aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)obj = s3.get_object(Bucket=bucket_name, Key=object_key)data = obj['Body'].read()df = pd.read_excel(io.BytesIO(data), encoding='utf-8')


You can directly read xls file from S3 without having to download or save it locally. xlrd module has a provision to provide raw data to create workbook object. Following is the code snippet.

from boto3 import Session  from xlrd.book import open_workbook_xls  aws_id = ''    aws_secret = ''bucket_name = ''object_key = ''s3_session = Session(aws_access_key_id=aws_id, aws_secret_access_key=aws_secret)bucket_object = s3_session.resource('s3').Bucket(bucket_name).Object(object_key)content = bucket_object.get()['Body'].read()workbook = open_workbook_xls(file_contents=content)


Python doesn't support excel files natively. You could use the pandas library pandas library read_excel functionality