How do you Unit Test Python DataFrames How do you Unit Test Python DataFrames pandas pandas

How do you Unit Test Python DataFrames


While Pandas' test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.

These functions compare NumPy arrays, but you can get the array that underlies a Pandas DataFrame using the values property. You can define a simple DataFrame and compare what your function returns to what you expect.

One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixtures to define that DataFrame once, and use it in multiple tests.

In terms of resources, I found this article on Testing with NumPy and Pandas to be very useful. I also did a short presentation about data analysis testing at PyCon Canada 2016: Automate Your Data Analysis Testing.


you can use pandas testing functions:

It will give more flexbile to compare your result with computed result in different ways.

For example:

df1=pd.DataFrame({'a':[1,2,3,4,5]})df2=pd.DataFrame({'a':[6,7,8,9,10]})expected_res=pd.Series([7,9,11,13,15])pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)

For more details refer this link


I don't think it's hard to create small DataFrames for unit testing?

import pandas as pdfrom nose.tools import assert_dict_equalinput = pd.DataFrame.from_dict({    'field_1': [some, values],    'field_2': [other, values]})expected = {    'result': [...]}assert_dict_equal(expected, my_func(input).to_dict(), "oops, there's a bug...")