Pythonic way to sorting list of namedtuples by field name Pythonic way to sorting list of namedtuples by field name python python

Pythonic way to sorting list of namedtuples by field name


from operator import attrgetterfrom collections import namedtuplePerson = namedtuple('Person', 'name age score')seq = [Person(name='nick', age=23, score=100),       Person(name='bob', age=25, score=200)]

Sort list by name

sorted(seq, key=attrgetter('name'))

Sort list by age

sorted(seq, key=attrgetter('age'))


sorted(seq, key=lambda x: x.name)sorted(seq, key=lambda x: x.age)


I tested the two alternatives given here for speed, since @zenpoy was concerned about performance.

Testing script:

import randomfrom collections import namedtuplefrom timeit import timeitfrom operator import attrgetterruns = 10000size = 10000random.seed = 42Person = namedtuple('Person', 'name,age')seq = [Person(str(random.randint(0, 10 ** 10)), random.randint(0, 100)) for _ in range(size)]def attrgetter_test_name():    return sorted(seq.copy(), key=attrgetter('name'))def attrgetter_test_age():    return sorted(seq.copy(), key=attrgetter('age'))def lambda_test_name():    return sorted(seq.copy(), key=lambda x: x.name)def lambda_test_age():    return sorted(seq.copy(), key=lambda x: x.age)print('attrgetter_test_name', timeit(stmt=attrgetter_test_name, number=runs))print('attrgetter_test_age', timeit(stmt=attrgetter_test_age, number=runs))print('lambda_test_name', timeit(stmt=lambda_test_name, number=runs))print('lambda_test_age', timeit(stmt=lambda_test_age, number=runs))

Results:

attrgetter_test_name 44.26793992166096attrgetter_test_age 31.98247099677627lambda_test_name 47.97959511074551lambda_test_age 35.69356267603864

Using lambda was indeed slower. Up to 10% slower.

EDIT:

Further testing shows the results when sorting using multiple attributes. Added the following two test cases with the same setup:

def attrgetter_test_both():    return sorted(seq.copy(), key=attrgetter('age', 'name'))def lambda_test_both():    return sorted(seq.copy(), key=lambda x: (x.age, x.name))print('attrgetter_test_both', timeit(stmt=attrgetter_test_both, number=runs))print('lambda_test_both', timeit(stmt=lambda_test_both, number=runs))

Results:

attrgetter_test_both 92.80101586919373lambda_test_both 96.85089983147456

Lambda still underperforms, but less so. Now about 5% slower.

Testing is done on Python 3.6.0.