Comparing a dataframe on string lengths for different columns Comparing a dataframe on string lengths for different columns pandas pandas

Comparing a dataframe on string lengths for different columns


I think you need list comprehension, because string function works only with Series (column):

print ([df[col].str.len().min() for col in ['a','b','c']])

Another solution with apply:

print ([df[col].apply(len).min() for col in ['a','b','c']])

Sample:

df = pd.DataFrame({'a':['h','gg','yyy'],                   'b':['st','dsws','sw'],                   'c':['fffff','','rr'],                   'd':[1,3,5]})print (df)     a     b      c  d0    h    st  fffff  11   gg  dsws         32  yyy    sw     rr  5print ([df[col].str.len().min() for col in ['a','b','c']])[1, 2, 0]

Timings:

#[3000 rows x 4 columns]df = pd.concat([df]*1000).reset_index(drop=True)In [17]: %timeit ([df[col].apply(len).min() for col in ['a','b','c']])100 loops, best of 3: 2.63 ms per loopIn [18]: %timeit ([df[col].str.len().min() for col in ['a','b','c']])The slowest run took 4.12 times longer than the fastest. This could mean that an intermediate result is being cached.100 loops, best of 3: 2.88 ms per loop

Conclusion:

apply is faster, but not works with None.

df = pd.DataFrame({'a':['h','gg','yyy'],                   'b':[None,'dsws','sw'],                   'c':['fffff','','rr'],                   'd':[1,3,5]})print (df)     a     b      c  d0    h  None  fffff  11   gg  dsws         32  yyy    sw     rr  5print ([df[col].apply(len).min() for col in ['a','b','c']])

TypeError: object of type 'NoneType' has no len()

print ([df[col].str.len().min() for col in ['a','b','c']])[1, 2.0, 0]

EDIT by comment:

#fail with Noneprint (df[['a','b','c']].applymap(len).min(axis=1))0    11    02    2dtype: int64

#working with Noneprint (df[['a','b','c']].apply(lambda x: x.str.len().min(), axis=1))0    11    02    2dtype: int64