Python group by Python group by python python

Python group by


Do it in 2 steps. First, create a dictionary.

>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]>>> from collections import defaultdict>>> res = defaultdict(list)>>> for v, k in input: res[k].append(v)...

Then, convert that dictionary into the expected format.

>>> [{'type':k, 'items':v} for k,v in res.items()][{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]

It is also possible with itertools.groupby but it requires the input to be sorted first.

>>> sorted_input = sorted(input, key=itemgetter(1))>>> groups = groupby(sorted_input, key=itemgetter(1))>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups][{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]

Note both of these do not respect the original order of the keys. You need an OrderedDict if you need to keep the order.

>>> from collections import OrderedDict>>> res = OrderedDict()>>> for v, k in input:...   if k in res: res[k].append(v)...   else: res[k] = [v]... >>> [{'type':k, 'items':v} for k,v in res.items()][{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]


Python's built-in itertools module actually has a groupby function , but for that the elements to be grouped must first be sorted such that the elements to be grouped are contiguous in the list:

from operator import itemgettersortkeyfn = itemgetter(1)input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'),  ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'),  ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')] input.sort(key=sortkeyfn)

Now input looks like:

[('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH'), ('11013331', 'KAT'), ('9843236', 'KAT'), ('9085267', 'NOT'), ('11788544', 'NOT')]

groupby returns a sequence of 2-tuples, of the form (key, values_iterator). What we want is to turn this into a list of dicts where the 'type' is the key, and 'items' is a list of the 0'th elements of the tuples returned by the values_iterator. Like this:

from itertools import groupbyresult = []for key,valuesiter in groupby(input, key=sortkeyfn):    result.append(dict(type=key, items=list(v[0] for v in valuesiter)))

Now result contains your desired dict, as stated in your question.

You might consider, though, just making a single dict out of this, keyed by type, and each value containing the list of values. In your current form, to find the values for a particular type, you'll have to iterate over the list to find the dict containing the matching 'type' key, and then get the 'items' element from it. If you use a single dict instead of a list of 1-item dicts, you can find the items for a particular type with a single keyed lookup into the master dict. Using groupby, this would look like:

result = {}for key,valuesiter in groupby(input, key=sortkeyfn):    result[key] = list(v[0] for v in valuesiter)

result now contains this dict (this is similar to the intermediate res defaultdict in @KennyTM's answer):

{'NOT': ['9085267', '11788544'],  'ETH': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'],  'KAT': ['11013331', '9843236']}

(If you want to reduce this to a one-liner, you can:

result = dict((key,list(v[0] for v in valuesiter)              for key,valuesiter in groupby(input, key=sortkeyfn))

or using the newfangled dict-comprehension form:

result = {key:list(v[0] for v in valuesiter)              for key,valuesiter in groupby(input, key=sortkeyfn)}


I also liked pandas simple grouping. it's powerful, simple and most adequate for large data set

result = pandas.DataFrame(input).groupby(1).groups