Convert pandas data frame to series
It's not smart enough to realize it's still a "vector" in math terms.
Say rather that it's smart enough to recognize a difference in dimensionality. :-)
I think the simplest thing you can do is select that row positionally using iloc
, which gives you a Series with the columns as the new index and the values as the values:
>>> df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])>>> df a0 a1 a2 a3 a40 0 1 2 3 4>>> df.iloc[0]a0 0a1 1a2 2a3 3a4 4Name: 0, dtype: int64>>> type(_)<class 'pandas.core.series.Series'>
You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame
).
df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])>>> df.squeeze(axis=0)a0 0a1 1a2 2a3 3a4 4Name: 0, dtype: int64
Note: To accommodate the point raised by @IanS (even though it is not in the OP's question), test for the dataframe's size. I am assuming that df
is a dataframe, but the edge cases are an empty dataframe, a dataframe of shape (1, 1), and a dataframe with more than one row in which case the use should implement their desired functionality.
if df.empty: # Empty dataframe, so convert to empty Series. result = pd.Series()elif df.shape == (1, 1) # DataFrame with one value, so convert to series with appropriate index. result = pd.Series(df.iat[0, 0], index=df.columns)elif len(df) == 1: # Convert to series per OP's question. result = df.T.squeeze()else: # Dataframe with multiple rows. Implement desired behavior. pass
This can also be simplified along the lines of the answer provided by @themachinist.
if len(df) > 1: # Dataframe with multiple rows. Implement desired behavior. passelse: result = pd.Series() if df.empty else df.iloc[0, :]
You can retrieve the series through slicing your dataframe using one of these two methods:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.htmlhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
import pandas as pdimport numpy as npdf = pd.DataFrame(data=np.random.randn(1,8))series1=df.iloc[0,:]type(series1)pandas.core.series.Series