safe enough 8-character short unique random string safe enough 8-character short unique random string python python

safe enough 8-character short unique random string


Your current method should be safe enough, but you could also take a look into the uuid module. e.g.

import uuidprint str(uuid.uuid4())[:8]

Output:

ef21b9ad


Which method has less collisions, is faster and easier to read?

TLDR

The random_choice is the fastest, has fewer collisions but is IMO slightly harder to read.

The most readable is shortuuid_random but is an external dependency and is slightly slower and has 6x the collisions.

The methods

alphabet = string.ascii_lowercase + string.digitssu = shortuuid.ShortUUID(alphabet=alphabet)def random_choice():    return ''.join(random.choices(alphabet, k=8))def truncated_uuid4():    return str(uuid.uuid4())[:8]def shortuuid_random():    return su.random(length=8)def secrets_random_choice():    return ''.join(secrets.choice(alphabet) for _ in range(8))

Results

All methods generate 8-character UUIDs from the abcdefghijklmnopqrstuvwxyz0123456789 alphabet. Collisions are calculated from a single run with 10 million draws. Time is reported in seconds as average function execution ± standard deviation, both calculated over 100 runs of 1,000 draws. Total time is the total execution time of the collision testing.

random_choice: collisions 22 - time (s) 0.00229 ± 0.00016 - total (s) 29.70518truncated_uuid4: collisions 11711 - time (s) 0.00439 ± 0.00021 - total (s) 54.03649shortuuid_random: collisions 124 - time (s) 0.00482 ± 0.00029 - total (s) 51.19624secrets_random_choice: collisions 15 - time (s) 0.02113 ± 0.00072 - total (s) 228.23106

Notes

  • the default shortuuid alphabet has uppercase characters, hence creating fewer collision. To make it a fair comparison we need to select the same alphabet as the other methods.
  • the secrets methods token_hex and token_urlsafe while possibly faster, have different alphabets, hence not eligible for the comparison.
  • the alphabet and class-based shortuuid methods are factored out as module variables, hence speeding up the method execution. This should not affect the TLDR.

Full testing details

import randomimport secretsfrom statistics import meanfrom statistics import stdevimport stringimport timeimport timeitimport uuidimport shortuuidalphabet = string.ascii_lowercase + string.digitssu = shortuuid.ShortUUID(alphabet=alphabet)def random_choice():    return ''.join(random.choices(alphabet, k=8))def truncated_uuid4():    return str(uuid.uuid4())[:8]def shortuuid_random():    return su.random(length=8)def secrets_random_choice():    return ''.join(secrets.choice(alphabet) for _ in range(8))def test_collisions(fun):    out = set()    count = 0    for _ in range(10_000_000):        new = fun()        if new in out:            count += 1        else:            out.add(new)    return countdef run_and_print_results(fun):    round_digits = 5    now = time.time()    collisions = test_collisions(fun)    total_time = round(time.time() - now, round_digits)    trials = 1_000    runs = 100    func_time = timeit.repeat(fun, repeat=runs, number=trials)    avg = round(mean(func_time), round_digits)    std = round(stdev(func_time), round_digits)    print(f'{fun.__name__}: collisions {collisions} - '          f'time (s) {avg} ± {std} - '          f'total (s) {total_time}')if __name__ == '__main__':    run_and_print_results(random_choice)    run_and_print_results(truncated_uuid4)    run_and_print_results(shortuuid_random)    run_and_print_results(secrets_random_choice)


Is there a reason you can't use tempfile to generate the names?

Functions like mkstemp and NamedTemporaryFile are absolutely guaranteed to give you unique names; nothing based on random bytes is going to give you that.

If for some reason you don't actually want the file created yet (e.g., you're generating filenames to be used on some remote server or something), you can't be perfectly safe, but mktemp is still safer than random names.

Or just keep a 48-bit counter stored in some "global enough" location, so you guarantee going through the full cycle of names before a collision, and you also guarantee knowing when a collision is going to happen.

They're all safer, and simpler, and much more efficient than reading urandom and doing an md5.

If you really do want to generate random names, ''.join(random.choice(my_charset) for _ in range(8)) is also going to be simpler than what you're doing, and more efficient. Even urlsafe_b64encode(os.urandom(6)) is just as random as the MD5 hash, and simpler and more efficient.

The only benefit of the cryptographic randomness and/or cryptographic hash function is in avoiding predictability. If that's not an issue for you, why pay for it? And if you do need to avoid predictability, you almost certainly need to avoid races and other much simpler attacks, so avoiding mkstemp or NamedTemporaryFile is a very bad idea.

Not to mention that, as Root points out in a comment, if you need security, MD5 doesn't actually provide it.