How to copy/paste DataFrame from Stack Overflow into Python
Pandas is written by people that really know what people want to do.
Since version 0.13
there's a function pd.read_clipboard
which is absurdly effective at making this "just work".
Copy and paste the part of the code in the question that starts bar foo
, (i.e. the DataFrame) and do this in a Python interpreter:
In [53]: import pandas as pdIn [54]: df = pd.read_clipboard()In [55]: dfOut[55]: bar foo0 4 11 5 22 6 3
Caveats
- Don't include the iPython
In
orOut
stuff or it won't work - If you have a named index, you currently need to add
engine='python'
(see this issue on GitHub). The 'c' engine is currently broken when the index is named. - It's not brilliant at MultiIndexes:
Try this:
0 1 2level1 level2 foo a 0.518444 0.239354 0.364764 b 0.377863 0.912586 0.760612bar a 0.086825 0.118280 0.592211
which doesn't work at all, or this:
0 1 2foo a 0.859630 0.399901 0.052504 b 0.231838 0.863228 0.017451bar a 0.422231 0.307960 0.801993
Which works, but returns something totally incorrect!
pd.read_clipboard()
is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that df
will outlive the contents of your clipboard:
# py3 only, see below for py2import pandas as pdfrom io import StringIOd = '''0 1 2 3 4A Y N N YB N Y N NC N N N ND Y Y N YE N Y Y YF Y Y N YG Y N N Y'''df = pd.read_csv(StringIO(d), sep='\s+')
A few notes:
- The triple-quoted string preserves the newlines in the output.
StringIO
wraps the output in a file-like object, whichread_csv
requires.- Setting
sep
to\s+
makes it so that each contiguous block of whitespace is treated as a single delimiter.
update
The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:
from io import StringIO
with instead:
from StringIO import StringIO
If you have an old version of pandas
(v0.24
or older) there's an easy way to write a Py2/Py3 compatible version of the above code:
import pandas as pdd = ...df = pd.read_csv(pd.compat.StringIO(d), sep='\s+')
The newest versions of pandas
have dropped the compat
module along with Python 2 support.
If you are copy-pasting from CSV file which has standard entries like this:
2016,10,M,0600,0610,13,1020,242016,3,F,0300,0330,21,6312,12015,4,M,0800,0830,8,7112,302015,10,M,0800,0810,19,0125,12016,8,M,1500,1510,21,0910,22015,10,F,0800,0810,3,8413,5df =pd.read_clipboard(sep=",", header=None)df.rename(columns={0: "Name0", 1: "Name1",2:"Name2",3:"Name3",4:"Name4",5:"Name5",6:"Name6",7:"Name7",8:"Name8"})
will give you properly defined pandas Dataframe.