Read file content from S3 bucket with boto3
boto3 offers a resource model that makes tasks like iterating through objects easier. Unfortunately, StreamingBody doesn't provide readline
or readlines
.
s3 = boto3.resource('s3')bucket = s3.Bucket('test-bucket')# Iterates through all the objects, doing the pagination for you. Each obj# is an ObjectSummary, so it doesn't contain the body. You'll need to call# get to get the whole body.for obj in bucket.objects.all(): key = obj.key body = obj.get()['Body'].read()
You might also consider the smart_open
module, which supports iterators:
from smart_open import smart_open# stream lines from an S3 objectfor line in smart_open('s3://mybucket/mykey.txt', 'rb'): print(line.decode('utf8'))
and context managers:
with smart_open('s3://mybucket/mykey.txt', 'rb') as s3_source: for line in s3_source: print(line.decode('utf8')) s3_source.seek(0) # seek to the beginning b1000 = s3_source.read(1000) # read 1000 bytes
Find smart_open
at https://pypi.org/project/smart_open/
Using the client instead of resource:
s3 = boto3.client('s3')bucket='bucket_name'result = s3.list_objects(Bucket = bucket, Prefix='/something/')for o in result.get('Contents'): data = s3.get_object(Bucket=bucket, Key=o.get('Key')) contents = data['Body'].read() print(contents.decode("utf-8"))