Getting first n unique elements from Python list
I would use a set
to remember what was seen and return from the generator when you have seen
enough:
a = [1, 2, 2, 3, 3, 4, 5, 6] def get_unique_N(iterable, N): """Yields (in order) the first N unique elements of iterable. Might yield less if data too short.""" seen = set() for e in iterable: if e in seen: continue seen.add(e) yield e if len(seen) == N: return k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)print(list(k))
Output:
[1, 2, 3, 4]
According to PEP-479 you should return
from generators, not raise StopIteration
- thanks to @khelwood & @iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper:
uses O(k)
lookups - with k
being the length of the slice - using a set reduces this to O(1)
lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1, 2, 3, 4, 4, 4, 4, 5]
vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]
:
For 6 uniques (in longer list):
- you would have lookups of
O(1)+O(2)+...+O(5001)
- mine would have
5001*O(1)
lookup + memory forset( {1, 2, 3, 4, 5, 6})
You can adapt the popular itertools
unique_everseen
recipe:
def unique_everseen_limit(iterable, limit=5): seen = set() seen_add = seen.add for element in iterable: if element not in seen: seen_add(element) yield element if len(seen) == limit: breaka = [1,2,2,3,3,4,5,6]res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by @Chris_Rands, you can use itertools.islice
to extract a fixed number of values from a non-limited generator:
from itertools import islicedef unique_everseen(iterable): seen = set() seen_add = seen.add for element in iterable: if element not in seen: seen_add(element) yield elementres = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen
recipe is available in 3rd party libraries via more_itertools.unique_everseen
or toolz.unique
, so you could use:
from itertools import islicefrom more_itertools import unique_everseenfrom toolz import uniqueres = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
If your objects are hashable (int
s are hashable) you can write utility function using fromkeys
method of collections.OrderedDict
class (or starting from Python3.7 a plain dict
, since they became officially ordered) like
from collections import OrderedDictdef nub(iterable): """Returns unique elements preserving order.""" return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate
can be simplified to
from itertools import islicedef iterate(itr, upper=5): return islice(nub(itr), upper)
or if you want always a list
as an output
def iterate(itr, upper=5): return list(nub(itr))[:upper]
Improvements
As @Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub
utility in a form of generator like others already did:
def nub(iterable): seen = set() add_seen = seen.add for element in iterable: if element in seen: continue yield element add_seen(element)