Read file content from S3 bucket with boto3 Read file content from S3 bucket with boto3 python python

Read file content from S3 bucket with boto3


boto3 offers a resource model that makes tasks like iterating through objects easier. Unfortunately, StreamingBody doesn't provide readline or readlines.

s3 = boto3.resource('s3')bucket = s3.Bucket('test-bucket')# Iterates through all the objects, doing the pagination for you. Each obj# is an ObjectSummary, so it doesn't contain the body. You'll need to call# get to get the whole body.for obj in bucket.objects.all():    key = obj.key    body = obj.get()['Body'].read()


You might also consider the smart_open module, which supports iterators:

from smart_open import smart_open# stream lines from an S3 objectfor line in smart_open('s3://mybucket/mykey.txt', 'rb'):    print(line.decode('utf8'))

and context managers:

with smart_open('s3://mybucket/mykey.txt', 'rb') as s3_source:    for line in s3_source:         print(line.decode('utf8'))    s3_source.seek(0)  # seek to the beginning    b1000 = s3_source.read(1000)  # read 1000 bytes

Find smart_open at https://pypi.org/project/smart_open/


Using the client instead of resource:

s3 = boto3.client('s3')bucket='bucket_name'result = s3.list_objects(Bucket = bucket, Prefix='/something/')for o in result.get('Contents'):    data = s3.get_object(Bucket=bucket, Key=o.get('Key'))    contents = data['Body'].read()    print(contents.decode("utf-8"))