Why does changing the sum order returns a different result? Why does changing the sum order returns a different result? javascript javascript

Why does changing the sum order returns a different result?


Maybe this question is stupid, but why does simply changing the order of the elements affects the result?

It will change the points at which the values are rounded, based on their magnitude. As an example of the kind of thing that we're seeing, let's pretend that instead of binary floating point, we were using a decimal floating point type with 4 significant digits, where each addition is performed at "infinite" precision and then rounded to the nearest representable number. Here are two sums:

1/3 + 2/3 + 2/3 = (0.3333 + 0.6667) + 0.6667                = 1.000 + 0.6667 (no rounding needed!)                = 1.667 (where 1.6667 is rounded to 1.667)2/3 + 2/3 + 1/3 = (0.6667 + 0.6667) + 0.3333                = 1.333 + 0.3333 (where 1.3334 is rounded to 1.333)                = 1.666 (where 1.6663 is rounded to 1.666)

We don't even need non-integers for this to be a problem:

10000 + 1 - 10000 = (10000 + 1) - 10000                  = 10000 - 10000 (where 10001 is rounded to 10000)                  = 010000 - 10000 + 1 = (10000 - 10000) + 1                  = 0 + 1                  = 1

This demonstrates possibly more clearly that the important part is that we have a limited number of significant digits - not a limited number of decimal places. If we could always keep the same number of decimal places, then with addition and subtraction at least, we'd be fine (so long as the values didn't overflow). The problem is that when you get to bigger numbers, smaller information is lost - the 10001 being rounded to 10000 in this case. (This is an example of the problem that Eric Lippert noted in his answer.)

It's important to note that the values on the first line of the right hand side are the same in all cases - so although it's important to understand that your decimal numbers (23.53, 5.88, 17.64) won't be represented exactly as double values, that's only a problem because of the problems shown above.


Here's what's going on in binary. As we know, some floating-point values cannot be represented exactly in binary, even if they can be represented exactly in decimal. These 3 numbers are just examples of that fact.

With this program I output the hexadecimal representations of each number and the results of each addition.

public class Main{   public static void main(String args[]) {      double x = 23.53;   // Inexact representation      double y = 5.88;    // Inexact representation      double z = 17.64;   // Inexact representation      double s = 47.05;   // What math tells us the sum should be; still inexact      printValueAndInHex(x);      printValueAndInHex(y);      printValueAndInHex(z);      printValueAndInHex(s);      System.out.println("--------");      double t1 = x + y;      printValueAndInHex(t1);      t1 = t1 + z;      printValueAndInHex(t1);      System.out.println("--------");      double t2 = x + z;      printValueAndInHex(t2);      t2 = t2 + y;      printValueAndInHex(t2);   }   private static void printValueAndInHex(double d)   {      System.out.println(Long.toHexString(Double.doubleToLongBits(d)) + ": " + d);   }}

The printValueAndInHex method is just a hex-printer helper.

The output is as follows:

403787ae147ae148: 23.534017851eb851eb85: 5.884031a3d70a3d70a4: 17.644047866666666666: 47.05--------403d68f5c28f5c29: 29.414047866666666666: 47.05--------404495c28f5c28f6: 41.174047866666666667: 47.050000000000004

The first 4 numbers are x, y, z, and s's hexadecimal representations. In IEEE floating point representation, bits 2-12 represent the binary exponent, that is, the scale of the number. (The first bit is the sign bit, and the remaining bits for the mantissa.) The exponent represented is actually the binary number minus 1023.

The exponents for the first 4 numbers are extracted:

    sign|exponent403 => 0|100 0000 0011| => 1027 - 1023 = 4401 => 0|100 0000 0001| => 1025 - 1023 = 2403 => 0|100 0000 0011| => 1027 - 1023 = 4404 => 0|100 0000 0100| => 1028 - 1023 = 5

First set of additions

The second number (y) is of smaller magnitude. When adding these two numbers to get x + y, the last 2 bits of the second number (01) are shifted out of range and do not figure into the calculation.

The second addition adds x + y and z and adds two numbers of the same scale.

Second set of additions

Here, x + z occurs first. They are of the same scale, but they yield a number that is higher up in scale:

404 => 0|100 0000 0100| => 1028 - 1023 = 5

The second addition adds x + z and y, and now 3 bits are dropped from y to add the numbers (101). Here, there must be a round upwards, because the result is the next floating point number up: 4047866666666666 for the first set of additions vs. 4047866666666667 for the second set of additions. That error is significant enough to show in the printout of the total.

In conclusion, be careful when performing mathematical operations on IEEE numbers. Some representations are inexact, and they become even more inexact when the scales are different. Add and subtract numbers of similar scale if you can.


Jon's answer is of course correct. In your case the error is no larger than the error you would accumulate doing any simple floating point operation. You've got a scenario where in one case you get zero error and in another you get a tiny error; that's not actually that interesting a scenario. A good question is: are there scenarios where changing the order of calculations goes from a tiny error to a (relatively) enormous error? The answer is unambiguously yes.

Consider for example:

x1 = (a - b) + (c - d) + (e - f) + (g - h);

vs

x2 = (a + c + e + g) - (b + d + f + h);

vs

x3 = a - b + c - d + e - f + g - h;

Obviously in exact arithmetic they would be the same. It is entertaining to try to find values for a, b, c, d, e, f, g, h such that the values of x1 and x2 and x3 differ by a large quantity. See if you can do so!