Tuesday, November 20, 2012

C Pointers Explained, Really

While I was in college, a friend of mine complained that he was confused while programming in C, struggling to learn the syntax for pointers.

He gave the example of something like: *x=**p++ being ugly and unreadable, with too many operations layered on each other, making it hard to tell what was happening.  He said he had done a bit of programming with assembly language, but he wasn't accustomed to the nuances of C.

I wrote the following explanation on our student message board, and I got a lot of good feedback.  Some people said that they had been programming in C for years, but not until they read my post did they finally understand pointers.  So here it is, unearthed from my backups and slightly edited.  I hope it helps someone again...

Message 1956 (8 left): Thu Jan 25 1990  2:44am
From: Bill! (easterb@ucscb)
Subject: Okay

Well, if you know assembly, you have a head start
on many of the cis freshpersons here.  You at least know
about memory maps:  RAM is a long long array of bytes.
It helped me to learn about pointers if I kept this in mind.
For some reason, books and instructors talking about
pointers want to overlook this.

When I have some code:

int n;
int *p;

There is a place in my memory that looks like this:

Address:   :
  0x5100|     |   n is an integer, one machine word big
  0x5104|     |   p is a pointer, also one word big
  0x5108|     |   other unused memory

Let's give these variables some values.
I set n to be the number 151.

        n = 151;

I set the pointer p to point to the integer n.

        p = &n;

That says, "the value of the variable p is assigned the
address of the variable n".

Address:     :     Value at that address:
  0x5100  | 151|  n
  0x5104  |5100|  p
  0x5108  |   ?|

Now I want to print out the value of n, by two ways.

        printf("n is %d.\n", n);
        printf("n is %d.\n", *p);

The * operator says, "give me the object at the following address."
The object's type is the type that the pointer was declared as.
So, since we declared "int *p", the object pointed at will be
_assumed_ by C to be an int.  In this case, we were careful to
make this coincide with what we were pointing at.

Now I want to print out the memory address of n.

        printf("n is located at $%x.\n", &n);
        printf("n is located at $%x.\n", p);

The & operator says, "tell me the address where the following object
starts."  In this case, it is hex 5100 (I put a '$' before it, to
conform to the Assembly notation I am used to).
Notice the _value_ of p is an address.

Hm.  Does p have an address?  Sure.  It is a variable, and all
variables have their own address.  The address of p is hex 5104.

        printf("p is located at $%x.\n", &p);

Here we are taking the address of a pointer variable, 
using the & operator.

char name[] = "Bill";
char *p;
int *q;

Now we have an array to play with.  Here's how memory looks now:

 0x5100 |'B'|  "name" is an address constant that has value hex 5100
 0x5101 |'i'|  char: 1 byte
 0x5102 |'l'|  char: 1 byte
 0x5103 |'l'|  char: 1 byte
 0x5104 |\0 |  char: 1 byte
 0x5105 |   |  p is a pointer: 1 word
 0x5109 |   |  q is a pointer: 1 word

        p = name;

We set p to the value of name.  Now p has value hex 5100 too.
We can use the * dereferencing operator on p, and get the
character 'B' as a result.

Now what happens if I do this:


The pointer p is incremented.  What value does it have now?
Hex 5101.  Pretty simple.

Now let's try something irresponsible:

        q = name;

But q is a pointer to int!  If we dereference q, it will take
the word (typically 4 bytes) beginning at address "name" (which
is hex 5100) and try to convert it to an int.  'B', 'i', 'l', 'l'
converted to an int will be some large number, dependant on the
bit-ordering algorithm on your machine's architecture.  On ucscb,
it becomes 1114205292.  (to see how, line up the binary representation
of the ascii values for those 4 characters, and then run the 32 bits
together, and convert that resultant binary number as an integer.)

What we have just seen here is a key issue of pointers that I
mentioned earlier:  C assumes that what they are pointing at
is an object of the type that the pointer was designed to point at.
It is up to the programmer to make sure this happens correctly.


The int pointer is incremented.  What value does it have now?
Hex 5104.  Huh?!?  The answer is simple if you accept the above
paragraph.  It gets incremented by the size of the object it
_thinks_ it is pointing at.  It's an int pointer, so incrementing
it makes it advance a number of bytes equal to the size of an int.

Now print the dereferenced value of q (i.e. the value of the object
q is pointing to).  Well, it's pointing at a null byte, and then
the first 3 bytes of the char *p.  Now we're all messed up.
Nice going.  Try to convert _that_ to an integer representation.
Well actually, C will do it happily.  But it'll be another weird 

int n;

        n = 151;

int x;
        printf("%d.\n", x);

Here is a simple program that passes an int "by value".
That is, it copies the value of n into the new variable x!

 0x5100 |151|  n is an integer
 0x5104 |151|  x is another integer

When we mention x, we are using the value at location 5104,
and we can change it, read it, whatever, and it won't affect n,
the int at location 5100.

But what if we want to have f() modify the value and then
have that new value be available in main()?  C does this by
passing the variable "by reference".

int n;

        n = 151;

int *x;
        printf("%d.\n", *x);
        *x = 451;

Pass the _address_ of n, and declare x as a _pointer_ to int.
Actually, this is still passing by value, but the value being
passed is the address, not the number.

 0x5100 | 151|  n is an integer
 0x5104 |5100|  x is a pointer to int

Now if f() when we make use of *x, we are referring to the
value at location 5100.  This is the location of n.
After the assignment "*x = 451;", this is what we have:

 0x5100 | 451|  n is an integer
 0x5104 |5100|  x is a pointer to int

x still points to location 5100, but we have changed the value
of the object at that location.

Well, those are the basics.
You mentioned things like "*x=**p++" being ugly and unreadable.
Well, yeah, but here is a diagram that may help:

        |----|  here is a word in memory with initial value 0. 
 0x5100 |   0|  no variable name
 0x5104 |  12|  here is a value, a word in memory.  no variable name.
 0x5108 |5104|  Here is an int pointer, pointing at the previous word.
 0x511c |5108|  here is p, a pointer to int pointer.
 0x5120 |5100|  here is x, a pointer.  guess where it's pointing.

First let's see what p and x were declared as:
int *x;    /* pointer to int */
int **p;   /* pointer to pointer.  
              The subordinate pointer is a pointer to int.*/

You should know now what "*x" means.  It means, "the value of location 5100."
And you know what "*p" means, "the value of location 5108".
Now that value is another address!  Okay, let's dereference that
address: "**p" and we find (by the declaration) an int.

Now "*x = **p" looks like, "this int at 5100 gets the value of
that int at 5104."

And what does "**p++" mean?  Well, ++ binds tighter than *, so this
is equivalent to:  *( *( p++ ) )
Or, "pointer to pointer to int, and by the way, after we're done,
p has been incremented.  But we looked where it was pointing
before it got incremented, so we don't care.  Let the next statement
worry about it."

This content is copyright 2012 by Bill Karwin.  I'll share it under the terms of the Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0 Unported.