Numercially stable softmax

python numpy nan scientific-computing softmax

The softmax exp(x)/sum(exp(x)) is actually numerically well-behaved. It has only positive terms, so we needn't worry about loss of significance, and the denominator is at least as large as the numerator, so the result is guaranteed to fall between 0 and 1.

The only accident that might happen is over- or under-flow in the exponentials. Overflow of a single or underflow of all elements of x will render the output more or less useless.

But it is easy to guard against that by using the identity softmax(x) = softmax(x + c) which holds for any scalar c: Subtracting max(x) from x leaves a vector that has only non-positive entries, ruling out overflow and at least one element that is zero ruling out a vanishing denominator (underflow in some but not all entries is harmless).

Footnote: theoretically, catastrophic accidents in the sum are possible, but you'd need a ridiculous number of terms. For example, even using 16 bit floats which can only resolve 3 decimals---compared to 15 decimals of a "normal" 64 bit float---we'd need between 2^1431 (~6 x 10^431) and 2^1432 to get a sum that is off by a factor of two.

python numpy nan scientific-computing softmax

Softmax function is prone to two issues: overflow and underflow

Overflow: It occurs when very large numbers are approximated as infinity

Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as zero

To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x, define z such that:

z = x-max(x)

And then take the softmax of the new (stable) vector z

Example:

def stable_softmax(x):    z = x - max(x)    numerator = np.exp(z)    denominator = np.sum(numerator)    softmax = numerator/denominator    return softmax# input vectorIn [267]: vec = np.array([1, 2, 3, 4, 5])In [268]: stable_softmax(vec)Out[268]: array([ 0.01165623,  0.03168492,  0.08612854,  0.23412166,  0.63640865])# input vector with really large number, prone to overflow issueIn [269]: vec = np.array([12345, 67890, 99999999])In [270]: stable_softmax(vec)Out[270]: array([ 0.,  0.,  1.])

In the above case, we safely avoided the overflow problem by using stable_softmax()

For more details, see chapter Numerical Computation in deep learning book.

python numpy nan scientific-computing softmax

Extending @kmario23's answer to support 1 or 2 dimensional numpy arrays or lists (common if you're passing a batch of results through the softmax function):

import numpy as npdef stable_softmax(x):    z = x - np.max(x, axis=-1, keepdims=True)    numerator = np.exp(z)    denominator = np.sum(numerator, axis=-1, keepdims=True)    softmax = numerator / denominator    return softmaxtest1 = np.array([12345, 67890, 99999999])  # 1Dtest2 = np.array([[12345, 67890, 99999999], [123, 678, 88888888]])  # 2Dtest3 = [12345, 67890, 999999999]test4 = [[12345, 67890, 999999999]]print(stable_softmax(test1))print(stable_softmax(test2))print(stable_softmax(test3))print(stable_softmax(test4)) [0. 0. 1.][[0. 0. 1.] [0. 0. 1.]] [0. 0. 1.][[0. 0. 1.]]

CodeHunter

Numercially stable softmax

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last