Convert pandas dataframe to NumPy array

`df.to_numpy()` is better than `df.values`, here's why.^*

It's time to deprecate your usage of values and as_matrix().

pandas v0.24.0 introduced two new methods for obtaining NumPy arrays from pandas objects:

to_numpy(), which is defined on Index, Series, and DataFrame objects, and
array, which is defined on Index and Series objects only.

If you visit the v0.24 docs for .values, you will see a big red warning that says:

Warning: We recommend using DataFrame.to_numpy() instead.

See this section of the v0.24.0 release notes, and this answer for more information.

_{* - to_numpy() is my recommended method for any production code that needs to run reliably for many versions into the future. However if you're just making a scratchpad in jupyter or the terminal, using .values to save a few milliseconds of typing is a permissable exception. You can always add the fit n finish later.}

Towards Better Consistency: `to_numpy()`

In the spirit of better consistency throughout the API, a new method to_numpy has been introduced to extract the underlying NumPy array from DataFrames.

# Setupdf = pd.DataFrame(data={'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]},                   index=['a', 'b', 'c'])# Convert the entire DataFramedf.to_numpy()# array([[1, 4, 7],#        [2, 5, 8],#        [3, 6, 9]])# Convert specific columnsdf[['A', 'C']].to_numpy()# array([[1, 7],#        [2, 8],#        [3, 9]])

As mentioned above, this method is also defined on Index and Series objects (see here).

df.index.to_numpy()# array(['a', 'b', 'c'], dtype=object)df['A'].to_numpy()#  array([1, 2, 3])

By default, a view is returned, so any modifications made will affect the original.

v = df.to_numpy()v[0, 0] = -1 df   A  B  Ca -1  4  7b  2  5  8c  3  6  9

If you need a copy instead, use to_numpy(copy=True).

pandas >= 1.0 update for ExtensionTypes

If you're using pandas 1.x, chances are you'll be dealing with extension types a lot more. You'll have to be a little more careful that these extension types are correctly converted.

a = pd.array([1, 2, None], dtype="Int64")                                  a                                                                          <IntegerArray>[1, 2, <NA>]Length: 3, dtype: Int64 # Wronga.to_numpy()                                                               # array([1, 2, <NA>], dtype=object)  # yuck, objects# Correcta.to_numpy(dtype='float', na_value=np.nan)                                 # array([ 1.,  2., nan])# Also correcta.to_numpy(dtype='int', na_value=-1)# array([ 1,  2, -1])

This is called out in the docs.

If you need the `dtypes` in the result...

As shown in another answer, DataFrame.to_records is a good way to do this.

df.to_records()# rec.array([('a', 1, 4, 7), ('b', 2, 5, 8), ('c', 3, 6, 9)],#           dtype=[('index', 'O'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8')])

This cannot be done with to_numpy, unfortunately. However, as an alternative, you can use np.rec.fromrecords:

v = df.reset_index()np.rec.fromrecords(v, names=v.columns.tolist())# rec.array([('a', 1, 4, 7), ('b', 2, 5, 8), ('c', 3, 6, 9)],#           dtype=[('index', '<U1'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8')])

Performance wise, it's nearly the same (actually, using rec.fromrecords is a bit faster).

df2 = pd.concat([df] * 10000)%timeit df2.to_records()%%timeitv = df2.reset_index()np.rec.fromrecords(v, names=v.columns.tolist())12.9 ms ± 511 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)9.56 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Rationale for Adding a New Method

to_numpy() (in addition to array) was added as a result of discussions under two GitHub issues GH19954 and GH23623.

Specifically, the docs mention the rationale:

[...] with .values it was unclear whether the returned value would be theactual array, some transformation of it, or one of pandas customarrays (like Categorical). For example, with PeriodIndex, .valuesgenerates a new ndarray of period objects each time. [...]

to_numpy aims to improve the consistency of the API, which is a major step in the right direction. .values will not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.

Critique of Other Solutions

DataFrame.values has inconsistent behaviour, as already noted.

DataFrame.get_values() is simply a wrapper around DataFrame.values, so everything said above applies.

DataFrame.as_matrix() is deprecated now, do NOT use!

python arrays pandas numpy dataframe

To convert a pandas dataframe (df) to a numpy ndarray, use this code:

df.valuesarray([[nan, 0.2, nan],       [nan, nan, 0.5],       [nan, 0.2, 0.5],       [0.1, 0.2, nan],       [0.1, 0.2, 0.5],       [0.1, nan, 0.5],       [0.1, nan, nan]])

python arrays pandas numpy dataframe

Note: The .as_matrix() method used in this answer is deprecated. Pandas 0.23.4 warns:

Method .as_matrix will be removed in a future version. Use .values instead.

Pandas has something built in...

numpy_matrix = df.as_matrix()

gives

array([[nan, 0.2, nan],       [nan, nan, 0.5],       [nan, 0.2, 0.5],       [0.1, 0.2, nan],       [0.1, 0.2, 0.5],       [0.1, nan, 0.5],       [0.1, nan, nan]])

CodeHunter

Convert pandas dataframe to NumPy array

`df.to_numpy()` is better than `df.values`, here's why.^*

Warning: We recommend using `DataFrame.to_numpy()` instead.

Towards Better Consistency: `to_numpy()`

pandas >= 1.0 update for ExtensionTypes

If you need the `dtypes` in the result...

Rationale for Adding a New Method

Critique of Other Solutions

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Convert pandas dataframe to NumPy array

df.to_numpy() is better than df.values, here's why.*

Warning: We recommend using DataFrame.to_numpy() instead.

Towards Better Consistency: to_numpy()

pandas >= 1.0 update for ExtensionTypes

If you need the dtypes in the result...

Rationale for Adding a New Method

Critique of Other Solutions

Recent Posts

`df.to_numpy()` is better than `df.values`, here's why.^*

Warning: We recommend using `DataFrame.to_numpy()` instead.

Towards Better Consistency: `to_numpy()`

If you need the `dtypes` in the result...