Getting Google Spreadsheet CSV into A Pandas Dataframe
Seems to work for me without the StringIO
:
test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' + '/export?gid=0&format=csv', # Set first column as rownames in data frame index_col=0, # Parse column values to datetime parse_dates=['Quradate'] )test.head(5) # Same result as @TomAugspurger
BTW, including the ?gid=
enables importing different sheets, find the gid in the URL.
You can use read_csv()
on a StringIO
object:
from io import BytesIOimport requestsimport pandas as pdr = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')data = r.content In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])In [11]: df.head()Out[11]: City region Res_Comm \0 Dothan South_Central-Montgomery-Auburn-Wiregrass-Dothan Residential 10 Foley South_Mobile-Baldwin Residential 12 Birmingham North_Central-Birmingham-Tuscaloosa-Anniston Commercial 38 Brent North_Central-Birmingham-Tuscaloosa-Anniston Residential 44 Athens North_Huntsville-Decatur-Florence Residential mkt_type Quradate National_exp Alabama_exp Sales_exp \0 Rural 2010-01-15 00:00:00 2 2 3 10 Suburban_Urban 2010-01-15 00:00:00 4 4 4 12 Suburban_Urban 2010-01-15 00:00:00 2 2 3 38 Rural 2010-01-15 00:00:00 3 3 3 44 Suburban_Urban 2010-01-15 00:00:00 4 5 4 Inventory_exp Price_exp Credit_exp 0 2 3 3 10 4 4 3 12 2 2 3 38 3 3 2 44 4 4 4
Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER
.
sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'
First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER
:
csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
Then we pass it to pd.read_csv, which can take a URL.
df = pd.read_csv(csv_export_url)
This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.