Comparing two pandas series for floating point near-equality? Comparing two pandas series for floating point near-equality? numpy numpy

Comparing two pandas series for floating point near-equality?


You can use numpy.allclose:

numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

Returns True if two arrays are element-wise equal within a tolerance.

The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b.

numpy works well with pandas.Series objects, so if you have two of them - s1 and s2, you can simply do:

np.allclose(s1, s2, atol=...) 

Where atol is your tolerance value.


Numpy works well with pandas Series. However one has to be careful with the order of indices (or columns and indices for pandas DataFrame)

For example

series_1 = pd.Series(data=[0,1], index=['a','b'])series_2 = pd.Series(data=[1,0], index=['b','a']) np.allclose(series_1,series_2)

will return False

A workaround is to use the index of one pandas series

np.allclose(series_1, series_2.loc[series_1.index])


If you want to avoid numpy, there is another way, use assert_series_equal

import pandas as pds1 = pd.Series([1.333333, 1.666666])s2 = pd.Series([1.333, 1.666])from pandas.testing import assert_series_equalassert_series_equal(s1,s2)  

raises an AssertionError. So use the check_less_precise flag

assert_series_equal(s1,s2, check_less_precise= True)  # No assertion error

This doesn't raise an AssertionError as check_less_precise only compares 3 digits after decimal.

See the docs here

Not good to use asserts but if you want to avoid numpy, this is a way.