Python: Rename duplicates in list with progressive numbers without sorting list Python: Rename duplicates in list with progressive numbers without sorting list python python

Python: Rename duplicates in list with progressive numbers without sorting list


My solution with map and lambda:

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

More traditional form

newlist = []for i, v in enumerate(mylist):    totalcount = mylist.count(v)    count = mylist[:i].count(v)    newlist.append(v + str(count + 1) if totalcount > 1 else v)

And last one

[v + str(mylist[:i].count(v) + 1) if mylist.count(v) > 1 else v for i, v in enumerate(mylist)]


This is how I would do it. EDIT: I wrote this into a more generalized utility function since people seem to like this answer.

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]check = ["name1", "state", "name2", "city", "name3", "zip1", "zip2"]copy = mylist[:]  # so we will only mutate the copy in case of failurefrom collections import Counter # Counter counts the number of occurrences of each itemfrom itertools import tee, countdef uniquify(seq, suffs = count(1)):    """Make all the items unique by adding a suffix (1, 2, etc).    `seq` is mutable sequence of strings.    `suffs` is an optional alternative suffix iterable.    """    not_unique = [k for k,v in Counter(seq).items() if v>1] # so we have: ['name', 'zip']    # suffix generator dict - e.g., {'name': <my_gen>, 'zip': <my_gen>}    suff_gens = dict(zip(not_unique, tee(suffs, len(not_unique))))      for idx,s in enumerate(seq):        try:            suffix = str(next(suff_gens[s]))        except KeyError:            # s was unique            continue        else:            seq[idx] += suffixuniquify(copy)assert copy==check  # raise an error if we failedmylist = copy  # success

If you wanted to append an underscore before each count, you could do something like this:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]>>> uniquify(mylist, (f'_{x!s}' for x in range(1, 100)))>>> mylist['name_1', 'state', 'name_2', 'city', 'name_3', 'zip_1', 'zip_2']

...or if you wanted to use letters instead:

>>> mylist = ["name", "state", "name", "city", "name", "zip", "zip"]>>> import string>>> uniquify(mylist, (f'_{x!s}' for x in string.ascii_lowercase))>>> mylist['name_a', 'state', 'name_b', 'city', 'name_c', 'zip_a', 'zip_b']

NOTE: this is not the fastest possible algorithm; for that, refer to the answer by ronakg. The advantage of the function above is it is easy to understand and read, and you're not going to see much of a performance difference unless you have an extremely large list.

EDIT: Here is my original answer in a one-liner, however the order is not preserved and it uses the .index method, which is extremely suboptimal (as explained in the answer by DTing). See the answer by queezz for a nice 'two-liner' that preserves order.

[s + str(suffix) if num>1 else s for s,num in Counter(mylist).items() for suffix in range(1, num+1)]# Produces: ['zip1', 'zip2', 'city', 'state', 'name1', 'name2', 'name3']


Any method where count is called on each element is going to result in O(n^2) since count is O(n). You can do something like this:

# not modifying original listfrom collections import Countermylist = ["name", "state", "name", "city", "name", "zip", "zip"]counts = {k:v for k,v in Counter(mylist).items() if v > 1}newlist = mylist[:]for i in reversed(range(len(mylist))):    item = mylist[i]    if item in counts and counts[item]:        newlist[i] += str(counts[item])        counts[item]-=1print(newlist)# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

# modifying original listfrom collections import Countermylist = ["name", "state", "name", "city", "name", "zip", "zip"]counts = {k:v for k,v in Counter(mylist).items() if v > 1}      for i in reversed(range(len(mylist))):    item = mylist[i]    if item in counts and counts[item]:        mylist[i] += str(counts[item])        counts[item]-=1print(mylist)# ['name1', 'state', 'name2', 'city', 'name3', 'zip1', 'zip2']

This should be O(n).

Other provided answers:

mylist.index(s) per element causes O(n^2)

mylist = ["name", "state", "name", "city", "name", "zip", "zip"]from collections import Countercounts = Counter(mylist)for s,num in counts.items():    if num > 1:        for suffix in range(1, num + 1):            mylist[mylist.index(s)] = s + str(suffix) 

count(x[1]) per element causes O(n^2)
It is also used multiple times per element along with list slicing.

print map(lambda x: x[1] + str(mylist[:x[0]].count(x[1]) + 1) if mylist.count(x[1]) > 1 else x[1], enumerate(mylist))

Benchmarks:

http://nbviewer.ipython.org/gist/dting/c28fb161de7b6287491b