[NTLUG:Discuss] The wrong computation example from the newsgroup

Sun Mar 18 13:02:19 CST 2001

On Sun, 18 Mar 2001 12:16:14 CST, the world broke into rejoicing as
Fred James <fredjame at concentric.net>  said:
> Having read all of the replies to date I must admit to feeling silly
> for having been sucked in - but sucked in I was.
> I appreciate the cool heads in the group for seeing through to the
> core issue and sharing that knowledge.
> Computers do what they are told, and since people must decide whether
> the results are appropriate an understanding of how the computer
> accomplishes a task is necessary.

Be aware that there may very well be a problem; it is rather surprising
that the compiler's macro processor comes up with a different value than
you get when you run a "compiled" version of the computation.  

You can get a more precise view of what is going on "under the covers,"
by the way, if rather than compiling to object code, you compile to
assembler.

% gcc -c -S ctest.c
% more ctest.s

For this, I get:
	.file	"ctest.c"
	.version	"01.01"
gcc2_compiled.:
.section	.rodata
.LC2:
	.string	"%d = %d\n"
	.align 8
.LC0:
	.long 0x33333333,0x3fd33333
	.align 8
.LC1:
	.long 0x66666666,0x3fe66666
.text
	.align 4
.globl main
	.type	 main, at function
main:
	pushl %ebp
	movl %esp,%ebp
	subl $40,%esp
	movl $60,-4(%ebp)
	movl $6,-8(%ebp)
	movl $10,-12(%ebp)
	addl $-4,%esp
	movl -4(%ebp),%eax
	leal -8(%ebp),%ecx
	cltd
	idivl (%ecx)
	movl %eax,-20(%ebp)
	fildl -20(%ebp)
	fldl .LC0
	fmulp %st,%st(1)
	fildl -12(%ebp)
	fldl .LC1
	fmulp %st,%st(1)
	faddp %st,%st(1)
	fnstcw -22(%ebp)
	movw -22(%ebp),%dx
	orw $3072,%dx
	movw %dx,-24(%ebp)
	fldcw -24(%ebp)
	fistpl -20(%ebp)
	movl -20(%ebp),%eax
	fldcw -22(%ebp)
	pushl %eax
	pushl $10
	pushl $.LC2
	call printf
	addl $16,%esp
	addl $-12,%esp
	pushl $1
	call exit
	addl $16,%esp
	.p2align 4,,7
.L2:
	leave
	ret
.Lfe1:
	.size	 main,.Lfe1-main
	.ident	"GCC: (GNU) 2.95.3 20010219 (prerelease)"

The optimized version, coming via:

% gcc -O2 -S ctest.c 

is rather shorter:

	.file	"ctest.c"
	.version	"01.01"
gcc2_compiled.:
.section	.rodata
.LC2:
	.string	"%d = %d\n"
.text
	.align 4
.globl main
	.type	 main, at function
main:
	pushl %ebp
	movl %esp,%ebp
	subl $24,%esp
	addl $-4,%esp
	pushl $10
	pushl $10
	pushl $.LC2
	call printf
	addl $-12,%esp
	pushl $1
	call exit
.Lfe1:
	.size	 main,.Lfe1-main
	.ident	"GCC: (GNU) 2.95.3 20010219 (prerelease)"

Color me "not vastly intimate with i386 assembler;" it is not _overly_
difficult to make out what's going on here.

The first version of the program has a fair bunch of floating point
instructions; the complete _lack_ of FP instructions in the second is
quite conspicuous.  

In the second version, _all_ the calculations are performed by the
compiler.  The crucial bit is thus:

	pushl $10
	pushl $10
	pushl $.LC2
	call printf

This is what sets up the printf() call; it pushes two copies of $10
onto the stack, pushes a reference, $.LC2, to the format string, and
then calls printf.  In effect, the computations got optimized out; the
compiler figured out that both
	 (int) (((60/6)*0.3) + (10*0.7))
	 (int) ((( a/b)*0.3) + ( c*0.7))
were in fact calculating _exactly the same thing_, and so computed the
value, 10, better known here as $10, and put that into the assembler
code.

The optimized version does no computation whatsoever; all the program
does is to push value 10 onto the stack twice, then prints those
values.

For those that are curious, the Alpha equivalent looks like:

--> Unoptimized:
	.file	1 "ctest.c"
	.set noat
	.set noreorder
.section	.rodata
$LC0:
	.ascii "%d = %d\12\0"
	.align 3
$LC1:
	.t_floating 2.99999999999999988898e-1
	.align 3
$LC2:
	.t_floating 6.99999999999999955591e-1
.text
	.align 5
	.globl main
	.ent main
main:
	.frame $15,48,$26,0
	.mask 0x4008000,-48
	ldgp $29,0($27)
$main..ng:
	lda $30,-48($30)
	stq $26,0($30)
	stq $15,8($30)
	mov $30,$15
	.prologue 1
	lda $1,60
	stl $1,16($15)
	lda $1,6
	stl $1,20($15)
	lda $1,10
	stl $1,24($15)
	ldl $24,16($15)
	ldl $25,20($15)
	divl $24,$25,$27
	mov $27,$1
	addl $1,$31,$2
	stq $2,32($15)
	ldt $f11,32($15)
	cvtqt $f11,$f10
	lda $1,$LC1
	ldt $f11,0($1)
	mult $f10,$f11,$f10
	lds $f12,24($15)
	cvtlq $f12,$f12
	cvtqt $f12,$f11
	lda $1,$LC2
	ldt $f12,0($1)
	mult $f11,$f12,$f11
	addt $f10,$f11,$f10
	cvttqc $f10,$f11
	stt $f11,32($15)
	ldq $1,32($15)
	mov $1,$2
	addl $2,$31,$1
	lda $16,$LC0
	lda $17,10
	mov $1,$18
	jsr $26,printf
	ldgp $29,0($26)
	lda $16,1
	jsr $26,exit
	ldgp $29,0($26)
$L2:
	mov $15,$30
	ldq $26,0($30)
	ldq $15,8($30)
	lda $30,48($30)
	ret $31,($26),1
	.end main
	.ident	"GCC: (GNU) 2.95.3 20010125 (prerelease)"

And, optimized, on Alpha:
	.file	1 "ctest.c"
	.set noat
	.set noreorder
.section	.rodata
$LC0:
	.ascii "%d = %d\12\0"
.text
	.align 5
	.globl main
	.ent main
main:
	.frame $30,16,$26,0
	.mask 0x4000000,-16
	ldgp $29,0($27)
$main..ng:
	lda $30,-16($30)
	lda $16,$LC0
	lda $17,10
	lda $18,10
	stq $26,0($30)
	.prologue 1
	jsr $26,printf
	ldgp $29,0($26)
	lda $16,1
	jsr $26,exit
	ldgp $29,0($26)
	.end main
	.ident	"GCC: (GNU) 2.95.3 20010125 (prerelease)"

And this is quite exactly equivalent to the Intel version, albeit with
using "lda" to push the values onto the stack rather than "pushl."
--
(concatenate 'string "aa454" "@freenet.carleton.ca")
http://vip.hex.net/~cbbrowne/oses.html
Rules of the  Evil Overlord #17. "When I employ  people as advisors, I
will occasionally listen to their advice."
<http://www.eviloverlord.com/>