How can I convert an absolutely massive number to a string in a reasonable amount of time? How can I convert an absolutely massive number to a string in a reasonable amount of time? python python

How can I convert an absolutely massive number to a string in a reasonable amount of time?


Repeated string concatenation is notoriously inefficient since Python strings are immutable. I would go for

strprime = str(prime)

In my benchmarks, this is consistently the fastest solution. Here's my little benchmark program:

import decimaldef f1(x):    ''' Definition by OP '''    strprime = ""    while x > 0:        strprime += str(x%10)        x //= 10    return strprimedef digits(x):    while x > 0:        yield x % 10        x //= 10def f2(x):    ''' Using string.join() to avoid repeated string concatenation '''    return "".join((chr(48 + d) for d in digits(x)))def f3(x):    ''' Plain str() '''    return str(x)def f4(x):    ''' Using Decimal class'''    return decimal.Decimal(x).to_eng_string()x = 2**100if __name__ == '__main__':    import timeit    for i in range(1,5):        funcName = "f" + str(i)        print(funcName+ ": " + str(timeit.timeit(funcName + "(x)", setup="from __main__ import " + funcName + ", x")))

For me, this prints (using Python 2.7.10):

f1: 15.3430171013f2: 20.8928260803f3: 0.310356140137f4: 2.80087995529


Python's integer to string conversion algorithm uses a simplistic algorithm with a running of O(n**2). As the length of the number doubles, the conversion time quadruples.

Some simple tests on my computer show the increase in running time:

$ time py35 -c "n=str(2**1000000)"user    0m1.808s$ time py35 -c "n=str(2**2000000)"user    0m7.128s$ time py35 -c "n=str(2**4000000)"user    0m28.444s$ time py35 -c "n=str(2**8000000)"user    1m54.164s

Since the actual exponent is about 10 times larger than my last test value, it should take about 100 times longer. Or just over 3 hours.

Can it be done faster? Yes. There are several methods that are faster.

Method 1

It is faster to divide the very large number by a power-of-10 into two roughly equal-sized but smaller numbers. The process is repeated until the numbers are relatively small. Then str() is used on each number and leading zeroes are used to pad the result to the same length as the last power-of-10. Then the strings are joined to form the final result. This method is used by the mpmath library and the documentation implies it should be about 3x faster.

Method 2

Python's integers are stored in binary format. Binary is great for calculations but binary-to-decimal conversion is the bottleneck. It is possible to define your own integer type that stores the value in blocks of 100 (or some similar value) decimal digits. Operations (exponentiation, multiplication, division) will be slower but conversion to a string will be very fast.

Many years ago, I implemented such a class and used efficient algorithms for multiplication and division. The code is no longer available on the Internet but I did find a backup copy that I tested. The running time was reduced to ~14 seconds.

Update

I updated the DecInt code referenced above and it is now available at https://github.com/casevh/DecInt.

If Python's native integer type is used, the total running time is less than 14 seconds on my computer. If gmpy2's integer type is used instead, the running time is ~3.5 seconds.

$ py35 DecInt.pyCalculating 2^74207281Exponentiation time: 3.236Conversion to decimal format: 0.304Total elapsed time: 3.540Length of result: 22338618 digits

Method 3

I maintain the gmpy2 library that provide easy access to the GMP library for fast integer arithmetic. GMP implements Method 1 in highly optimized C and assembly code and calculates the prime number and the string representation in ~5 seconds.

Method 4

The decimal module in Python stores values as decimal digits. Recent versions of Python 3 include a C implementation of the decimal library that is much faster that the pure-Python implementation include with Python 2. The C implementation run in just over 3 seconds on my computer.

from decimal import *getcontext().prec = 23000000getcontext().Emin = -999999999getcontext().Emax = 999999999x=Decimal(2)**74207281 - 1s=str(x)


Took about 32 seconds to output the file using WinGhci (Haskell language):

import System.IOmain = writeFile "prime.txt" (show (2^74207281 - 1))

The file was 21 megabytes; the last four digits, 6351.