How to import a text file on AWS S3 into pandas without writing to disk

pandas uses boto for read_csv, so you should be able to:

import botodata = pd.read_csv('s3://bucket....csv')

If you need boto3 because you are on python3.4+, you can

import boto3import ios3 = boto3.client('s3')obj = s3.get_object(Bucket='bucket', Key='key')df = pd.read_csv(io.BytesIO(obj['Body'].read()))

Since version 0.20.1 pandas uses s3fs, see answer below.

python pandas heroku amazon-s3 boto3

Now pandas can handle S3 URLs. You could simply do:

import pandas as pdimport s3fsdf = pd.read_csv('s3://bucket-name/file.csv')

You need to install s3fs if you don't have it. pip install s3fs

If your S3 bucket is private and requires authentication, you have two options:

1- Add access credentials to your ~/.aws/credentials config file

[default]aws_access_key_id=AKIAIOSFODNN7EXAMPLEaws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

2- Set the following environment variables with their proper values:

python pandas heroku amazon-s3 boto3

This is now supported in latest pandas. See

eg.,

df = pd.read_csv('s3://pandas-test/tips.csv')

CodeHunter