[NTLUG:Discuss] The wrong computation example from the newsgroup

Sun Mar 18 00:11:39 CST 2001

Chris Cox wrote:
> 
> Here's the original problem... for those NTLUGgers who want a challenge.

It's really not a challenge - it's just stupid.  Anyone with an ounce of
common sense and a couple of years programming experience can see what this
is all about.

> > If you've got a minute, and one of those 'bleeding edge' OSes, try compiling
> > with no optimizations and running this for fun:
> >
> > int main(void)
> > {
> > int a = 60, b = 6, c = 10;
> >
> > printf("%d = %d\n", (int) (((60/6)*0.3) + (10*0.7)), (int) ((( a/b)*0.3) +
> > ( c*0.7)));
> >
> > exit(1);
> > }

> > On all Linux distros, and only on Linux distros, ranging from an ancient
> > Slackware setup to the latest Red Hat, I get 9=10. On everything else, I get
> > 10=10.

Well, the difference is that the first expression is optimised by the compiler
and thus evaluated at compile time.  The second expression has to be computed
at runtime - although *some* compilers might figure out that they can compute
it at runtime too.

So - why is there a difference at all?

Well, 60/6 is just 10, so that's unambiguous and we really have: 

   10 * 0.3 

plus:

   10 * 0.7 

Let's look at 0.3 and 0.7 in binary:

  0.3 == binary .01001100110011001100110011....
  0.7 == binary .10110011001100110011001100....

These are infinitely recurring binary numbers - just like the number you
get by dividing 1 by 3 in decimal.  People tend to forget this.  Just because
a number repeats forever in one number base, doesn't mean that it will in another.

So, when you multiply each of those by 10 using binary arithmetic, you get:

  10*0.3 ==  2.9999999999999999999999999999.... 
  10*0.7 ==  6.9999999999999999999999999999....

...and when you add the two together, you get:

             9.9999999999999999999999999999....

...which your code then converts to an integer...which is 9.  Looks like Linux
got it right and all the others screwed up doesn't it!

However, all floating point math tends to have roundoff issues - and the exact
answer you get depends on the order of optimisations, how the math coprocessor
roundoff mode is set, etc etc.

The problem here is that the program is buggy since it relies critically on
exact roundoff of two binary numbers calculated different ways...that's virtually
the first thing they teach about floating point math...NEVER COMPARE TWO FLOATS
FOR EXACT EQUALITY...which is (in effect) what this program is doing.

A perfectly functioning C compiler is quite at liberty to have printed 9==9, 9==10,
10==9 or 10==10...and it's allowed to do it differently depending on the phase
of the moon XOR'ed with the users' biorhythms.

There is nothing wrong with Linux in this case - this is a problem entirely
of their own making.  I presume that these precise numbers were actually
contrived to make Linux look bad.  For different sets of numbers, you'll
get other results that could make Linux look good and the others look bad.

> > Go figure, and remember that the whole OS is compiled with that.

But the whole OS doesn't have ridiculous code like that in it!

> > I think I'll just stick to FreeBSD as far as my intel boxes are concerned.

Dumb, dumb, dumb.

If it were an actual *BUG*, it would have shown up in much more significant
ways.  I run literally millions of lines of floating point code that I've
written under both Linux and SGI's IRIX and I have yet to see a problem with
Linux's compiler or floating point support.

-- 
Steve Baker   HomeEmail: <sjbaker1 at airmail.net>
              WorkEmail: <sjbaker at link.com>
              HomePage : http://web2.airmail.net/sjbaker1
              Projects : http://plib.sourceforge.net
                         http://tuxaqfh.sourceforge.net
                         http://tuxkart.sourceforge.net
                         http://prettypoly.sourceforge.net
                         http://freeglut.sourceforge.net