How to import a text file on AWS S3 into pandas without writing to disk
pandas
uses boto
for read_csv
, so you should be able to:
import botodata = pd.read_csv('s3://bucket....csv')
If you need boto3
because you are on python3.4+
, you can
import boto3import ios3 = boto3.client('s3')obj = s3.get_object(Bucket='bucket', Key='key')df = pd.read_csv(io.BytesIO(obj['Body'].read()))
Since version 0.20.1 pandas
uses s3fs
, see answer below.
Now pandas can handle S3 URLs. You could simply do:
import pandas as pdimport s3fsdf = pd.read_csv('s3://bucket-name/file.csv')
You need to install s3fs
if you don't have it. pip install s3fs
Authentication
If your S3 bucket is private and requires authentication, you have two options:
1- Add access credentials to your ~/.aws/credentials
config file
[default]aws_access_key_id=AKIAIOSFODNN7EXAMPLEaws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Or
2- Set the following environment variables with their proper values:
aws_access_key_id
aws_secret_access_key
aws_session_token