[NTLUG:Discuss] OT C question

Patrick R. Michaud pmichaud at pobox.com
Mon Feb 20 16:18:57 CST 2006


On Mon, Feb 20, 2006 at 03:38:26PM -0600, Johnny Cybermyth wrote:
> I was using 2 files for my test since I originally thought it was a 
> module linking problem.  Here are the original files:
> // file1.c------------------------------------
> unsigned char myarray[5];
> ...
>
> // main.c------------------------------------
> extern unsigned char myarray[];
> ...

> These files work perfectly as written.  If you modify the the extern 
> statement in main.c to:
> 
> extern unsigned char *myarray;
> 
> the array won't update correctly.  

Aha, this is indeed the problem.  Since file1.c defines myarray
as being an array of characters, your extern statement needs to match.

"But char* is the same as char[]," I hear people say.  Actually,
in this case it's not.

The statement 

   unsigned char myarray[5];

allocates five bytes and sets "myarray" as a constant symbol
that points to the first byte of the array.  Most notably,
"myarray" is -not- a pointer as it would be if this had been
an unsigned char* definition.  Perhaps a contrasting example
will help:

   unsigned char myarray[5] = "abcd";
   unsigned char* mypoint = myarray;

What's the difference between myarray and mypoint?  One difference
is the amount of memory allocated: myarray allocates five bytes
for the array, initializes them to "abcd\0", and sets myarray
as a constant reference to the first byte in the array.

The mypoint definition allocates four bytes for the pointer
(assuming 32-bit addresses), and initializes those four bytes
to point to the first byte of the array.

Now then, it's true that myarray[0] and mypoint[0] will both
reference the same byte -- namely, the first byte of the array.
However, from a linker perspective, they're different --
"myarray" is a symbol that references the first byte of the array,
while "mypoint" is a symbol that references the pointer.

So, in an external file if you state:

    extern unsigned char myarray[];
    extern unsigned char* mypoint;

then this is telling the linker that "myarray" is a symbol
referencing the first byte of an array of unsigned characters,
while "mypoint" is a symbol referencing a pointer to unsigned
char.  If, on the other hand, you state

    extern unsigned char* myarray;    // mismatch

then the compiler expects "myarray" to be a symbol referencing
a 4-byte pointer, whereas in the definition it is actually a 
symbol referencing an array.  (The linker just maps symbol
names to memory locations and doesn't really control how things
are used.)

Note that this doesn't change the way we normally think about
parameter passing -- i.e., when passing an array as an argument
we can use either a pointer or array reference as the formal
parameter because C automatically passes array references as
though they are pointers.  Thus:

    extern unsigned char myarray[];

    extern void myfunc(unsigned char* x);

    int main(int argc, char* argv[]) {
        myfunc(myarray);
        // ...
    }

works just fine, because C always expects arrays to be passed and
received as pointer values on the stack.  But linking is different,
there we aren't passing parameters, but rather we're assigning
addresses to symbols, and the types have to match more precisely
in order for it to work.

Hope this helps,

Pm

P.S.:  The declaration of main as returning "void" is technically
incorrect (although many operating systems allow you to fudge this).
According to the relevant standards, "main" should always be 
declared as returning "int"; i.e., either

    int main(void)
    int main(int argc, char* argv[])

While many operating systems and linkers work even if main is
declared as returning void, there are some environments where
this can make a huge difference because the compiler generates
a function call/return sequence that is incompatible with what
the startup code expects.





More information about the Discuss mailing list