The Pointy Pointers
Edited By: Prof. Garth Santor
This post assumes that the concepts and ideas in Do You Get My Reference are well understood.
Introduction
Simply put, a pointer is a variable that holds a memory address as its value. However, it’s a special type of variable that has a slightly different way of initialization and access. Let’s look at an in-depth example.
int main() {
int i = 6;
int* ptr = nullptr;
ptr = &i;
*ptr = 10;
}
In the previous code, we are keeping track of the reference (memory location) of i
in the variable ptr
. The type of that variable is an int
pointer, signified with the ‘*’ after the type. The value of ptr
is the memory address of i
, assigned by getting at the reference of i
. In order to get to the variable instead of the pointer value, we need to de-reference the pointer. This all sounds really confusing, so let’s look at it in action.
In the above screenshot, we can see that i
resides at location 0x00D8FDE0
. ptr
holds the value of 0x00D8FDE0
, which is the location of i
. Notice that ptr
has its own memory location, and can be moved and passed in to functions. In memory, ptr
acts exactly like any other variable. The only difference to us is its value and method of access.
So when line 5 of the code executes: *ptr = 10
:
We can see that the value of i
has changed to 10, even though we assigned it through ptr
. So by dereferencing ptr
, we were able to get at the variable at the memory location that is held by ptr
, and manipulate that. Pretty cool right!
But so far, we’ve been playing in a somewhat safe and predictable zone: the stack. So how do we work with the heap? If you’re not sure what the difference is, I’ve written a stack vs heap post that you can check out. Otherwise, let’s dive deeper!
Pointing to the heap
C Style
There are several ways to gain the reference of an object allocated on the heap. The traditional C way is to use malloc, which queries the system for some available RAM with the size you had requested. If it finds it, the object is allocated at that address, which is then returned to the function caller. If no available memory is found or an error occurred, nullptr is returned (learn more here). Let’s look at an example:
int main() {
int* ptr = (int*) malloc(sizeof(int));
if (ptr != nullptr) {
*ptr = 10;
}
free(ptr);
}
size_t
for a parameter, and returns a void
pointer to the beginning of that allocated block of memoryIn this example, we’re asking for a heap allocation with the size of an int, which is 4 bytes (typically). We should also end up with two objects: the object on the heap that we are allocating, and the pointer to it which will store its address.
So here we see that ptr has been created on the stack at 0x006FFA64
. The value of that pointer is yet another reference, that addresses a location on the heap: 0x009B54E8
. Notice how far these two locations are as opposed to the previous examples. That’s because the stack and the heap have a gap between them to account for their growth.
So after checking whether our allocation was successful (which is good practice to always do), we’ve changed the value of that heap memory block to the value 10.
The last important part of this code is the free()
function call. Since the stack is managed memory, the program will take care of cleaning it up for us and deleting any unused space. However, the heap is all up to us to clean up. That’s why we need to call free()
on our pointer, telling it that we’re done with that memory now and can be used by other programs. Failure to do so will keep this memory locked down until the PC restarts. This failure is also called a memory leak.
Let’s watch what happens when free()
is called:
This is a very simple example of how heap allocation works in C. Larger and more dynamic memory allocations can be made this way, which allows for big data storage without statically storing everything on the stack.
This is also the way to achieve dynamic array allocations in C. You can read more about that in my Heap and Stack post.
C++ Style
To demonstrate that, let’s create a simple class:
class Foo {
public:
int i;
};
An instance of Foo
will take up however much space its components do, which in this case happens to be a single int
value. So Foo
will take up 4 bytes of memory.
So we see in the watch window that indeed, the instance of obj
takes up exactly 4 bytes of memory, as much as its components, in which case the integer. obj
is also located on the stack, at 0x0073F758
. So what exactly does the new
keyword do?
If we step over line 12, this is what happens:
ptr
was created on the stack, with a certain memory address stored in it. This memory address points to somewhere on the heap, and was provided to us by the new
keyword. In fact, the new
keyword is the C++ facade for malloc
.
So if we inspect the memory address that ptr
is holding, we will note that it’s in a completely different memory region than the stack:
Stepping over line 15, we see that our memory address holds the new value assigned to Foo
’s i
.
Notice the use of delete at the end. This is a very important step to take, as omitting it will result in a memory leak. Notice what happens when we step over that line:
This shows us that C’s malloc()
and C++’s new
keyword give us the same results.
Incrementing Pointers
One of the most important uses of pointers is to traverse through an array or data set, especially when storing large data.
int main() {
int* arr = (int*)malloc(5 * sizeof(int));
if (arr != nullptr) {
for (int i = 0; i < 5; ++i) {
arr[i] = (i + 1) * 3;
}
int* iterator = arr;
++iterator;
++(*iterator);
}
free(arr);
}
The first line allocates 5 integer-size slots of memory on the heap, which gives us a pointer to the beginning of that memory collection. We've basically allocated our own array, and can now access it through the []
operator. So our array now contains multiples of 3, from 1 to 5. Note that calling arr[4]
, arr + 4
. or arr - 3
(no assignment) does not change the value of arr
. The program temporarily seeks forward or backward without manipulating arr
. The proof to that is the following assignment of arr
to iterator
. Notice the watch window at the bottom and where arr
and iterator
are pointing to:
Both arr
and iterator
are pointing to the start of our array. So what happens if we run line 13: ++iterator
which is equivalent to iterator += 1
?
In the watch window we can see that iterator
has advanced by one, and is now pointing to the next element in our array, without manipulating arr
. That tells us that iterator
is a copy of arr
and does not affect the first pointer whatsoever. We can also see that nothing in our memory has changed, so no values were manipulated by advancing our iterator. So what does line 14 do?
By incrementing the dereferenced value of iterator
, we see that indeed, our value has changed in memory. That has not affected iterator
at all, only the memory that iterator
now points to.
Lastly, notice what free
does:
This is the most common usage of pointers: as iterators to large data collections. Modern C++ offers safer and more managed pointers and iterators.
Iterators
One common example is STL container iterators:
#include <vector>
int main() {
std::vector<int> examScores {10, 14, 7, 9, 17, 0};
long average = 0;
for( std::vector<int>::iterator score = examScores.begin(); score != examScores.end(); ++score) {
average += *score;
}
average /= examScores.size();
}
What we've basically done here is allocate a dynamic array on the heap using STL's vector
object, and asked it for a managed pointer to its beginning (examScore.begin()
). Then we were able to traverse our array by incrementing our iterator, and dereferencing it to retrieve the value it points to.
These structures are managed, meaning we don't need to worry about cleaning them up after we've exited our program, unless we're storing raw pointers in our vector.
Smart Pointers
Another managed C++ pointer structure is smart pointers. Smart pointers are encapsulated pointers that manage their own lifespan during the execution of a program.
There are 3 types of smart pointers:
Unique pointers: allow only one owner of the pointer, and does not allow copying. Once this one owner goes out of scope, the pointer is cleaned up
Shared pointers: allow multiple owners, and keep track of the number of owners it has. Once that reaches 0, they clean themselves and their underlying memory by themselves
Weak pointers: usually used in conjunction with smart pointers, and allow for retrieving a reference to a smart pointer without incrementing the smart pointer's owner (reference) count
Visit Microsoft’s Smart Pointer documentation for more info.
Pointers are extremely powerful, and allow us to do a multitude of cool things with code. But with great power comes great responsibility, and pointer misuse can easily lead to run-time failures and errors.
References
Kernighan, B. W.; Ritchie, D. M.: The C Programming Language