Reading a file from a private S3 bucket to a pandas dataframe

Pandas uses boto (not boto3) inside read_csv. You might be able to install boto and have it work correctly.

There's some troubles with boto and python 3.4.4 / python3.5.1. If you're on those platforms, and until those are fixed, you can use boto 3 as

import boto3import pandas as pds3 = boto3.client('s3')obj = s3.get_object(Bucket='bucket', Key='key')df = pd.read_csv(obj['Body'])

That obj had a .read method (which returns a stream of bytes), which is enough for pandas.

amazon-web-services pandas

Updated for Pandas 0.20.1

Pandas now uses s3fs to handle s3 coonnections. link

pandas now uses s3fs for handling S3 connections. This shouldn’t breakany code. However, since s3fs is not a required dependency, you willneed to install it separately, like boto in prior versions of pandas.

import osimport pandas as pdfrom s3fs.core import S3FileSystem# aws keys stored in ini file in same path# refer to boto3 docs for config settingsos.environ['AWS_CONFIG_FILE'] = 'aws_config.ini's3 = S3FileSystem(anon=False)key = 'path\to\your-csv.csv'bucket = 'your-bucket-name'df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))# or with f-stringsdf = pd.read_csv(s3.open(f'{bucket}/{key}', mode='rb'))

amazon-web-services pandas

Update for pandas 0.22 and up:

If you have already installed s3fs (pip install s3fs) then you can read the file directly from s3 path, without any imports:

data = pd.read_csv('s3:/bucket....csv')

stable docs

CodeHunter

Reading a file from a private S3 bucket to a pandas dataframe

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last