Virtual Index

View Original

The Pointy Pointers

Edited By: Prof. Garth Santor

This post assumes that the concepts and ideas in Do You Get My Reference are well understood.

Introduction

Simply put, a pointer is a variable that holds a memory address as its value. However, it’s a special type of variable that has a slightly different way of initialization and access. Let’s look at an in-depth example.

int main() {
    int i = 6;
    int* ptr = nullptr;
    ptr = &i;
    *ptr = 10;
}
A pointer is a variable that holds a memory address

In the previous code, we are keeping track of the reference (memory location) of i in the variable ptr. The type of that variable is an int pointer, signified with the ‘*’ after the type. The value of ptr is the memory address of i, assigned by getting at the reference of i. In order to get to the variable instead of the pointer value, we need to de-reference the pointer. This all sounds really confusing, so let’s look at it in action.

In the above screenshot, we can see that i resides at location 0x00D8FDE0. ptr holds the value of 0x00D8FDE0, which is the location of i. Notice that ptr has its own memory location, and can be moved and passed in to functions. In memory, ptr acts exactly like any other variable. The only difference to us is its value and method of access.

So when line 5 of the code executes: *ptr = 10 :

We can see that the value of i has changed to 10, even though we assigned it through ptr. So by dereferencing ptr, we were able to get at the variable at the memory location that is held by ptr, and manipulate that. Pretty cool right!

But so far, we’ve been playing in a somewhat safe and predictable zone: the stack. So how do we work with the heap? If you’re not sure what the difference is, I’ve written a stack vs heap post that you can check out. Otherwise, let’s dive deeper!

Pointing to the heap

C Style

There are several ways to gain the reference of an object allocated on the heap. The traditional C way is to use malloc, which queries the system for some available RAM with the size you had requested. If it finds it, the object is allocated at that address, which is then returned to the function caller. If no available memory is found or an error occurred, nullptr is returned (learn more here). Let’s look at an example:

int main() {
    int* ptr = (int*) malloc(sizeof(int));

    if (ptr != nullptr) {
        *ptr = 10;
    }

    free(ptr);
}
malloc is a C function that takes in size_t for a parameter, and returns a void pointer to the beginning of that allocated block of memory

In this example, we’re asking for a heap allocation with the size of an int, which is 4 bytes (typically). We should also end up with two objects: the object on the heap that we are allocating, and the pointer to it which will store its address.

So here we see that ptr has been created on the stack at 0x006FFA64. The value of that pointer is yet another reference, that addresses a location on the heap: 0x009B54E8. Notice how far these two locations are as opposed to the previous examples. That’s because the stack and the heap have a gap between them to account for their growth.

So after checking whether our allocation was successful (which is good practice to always do), we’ve changed the value of that heap memory block to the value 10.

The last important part of this code is the free() function call. Since the stack is managed memory, the program will take care of cleaning it up for us and deleting any unused space. However, the heap is all up to us to clean up. That’s why we need to call free() on our pointer, telling it that we’re done with that memory now and can be used by other programs. Failure to do so will keep this memory locked down until the PC restarts. This failure is also called a memory leak.

Let’s watch what happens when free() is called:

This is a very simple example of how heap allocation works in C. Larger and more dynamic memory allocations can be made this way, which allows for big data storage without statically storing everything on the stack.

This is also the way to achieve dynamic array allocations in C. You can read more about that in my Heap and Stack post.

C++ Style

To demonstrate that, let’s create a simple class:

class Foo {
public:
    int i;
};

An instance of Foo will take up however much space its components do, which in this case happens to be a single int value. So Foo will take up 4 bytes of memory.

So we see in the watch window that indeed, the instance of obj takes up exactly 4 bytes of memory, as much as its components, in which case the integer. obj is also located on the stack, at 0x0073F758. So what exactly does the new keyword do?

If we step over line 12, this is what happens:

ptr was created on the stack, with a certain memory address stored in it. This memory address points to somewhere on the heap, and was provided to us by the new keyword. In fact, the new keyword is the C++ facade for malloc.

So if we inspect the memory address that ptr is holding, we will note that it’s in a completely different memory region than the stack:

Stepping over line 15, we see that our memory address holds the new value assigned to Foo’s i.

Notice the use of delete at the end. This is a very important step to take, as omitting it will result in a memory leak. Notice what happens when we step over that line:

This shows us that C’s malloc() and C++’s new keyword give us the same results.

Incrementing Pointers

One of the most important uses of pointers is to traverse through an array or data set, especially when storing large data.

int main() {
    int* arr = (int*)malloc(5 * sizeof(int));

    if (arr != nullptr) {
        for (int i = 0; i < 5; ++i) {
            arr[i] = (i + 1) * 3;
        }

        int* iterator = arr;
        ++iterator;
        ++(*iterator);
    }

    free(arr);
}

The first line allocates 5 integer-size slots of memory on the heap, which gives us a pointer to the beginning of that memory collection. We've basically allocated our own array, and can now access it through the [] operator. So our array now contains multiples of 3, from 1 to 5. Note that calling arr[4], arr + 4. or arr - 3 (no assignment) does not change the value of arr. The program temporarily seeks forward or backward without manipulating arr. The proof to that is the following assignment of arr to iterator. Notice the watch window at the bottom and where arr and iterator are pointing to:

Both arr and iterator are pointing to the start of our array. So what happens if we run line 13: ++iterator which is equivalent to iterator += 1?

In the watch window we can see that iterator has advanced by one, and is now pointing to the next element in our array, without manipulating arr. That tells us that iterator is a copy of arr and does not affect the first pointer whatsoever. We can also see that nothing in our memory has changed, so no values were manipulated by advancing our iterator. So what does line 14 do?

By incrementing the dereferenced value of iterator, we see that indeed, our value has changed in memory. That has not affected iterator at all, only the memory that iterator now points to.
Lastly, notice what free does:

This is the most common usage of pointers: as iterators to large data collections. Modern C++ offers safer and more managed pointers and iterators.

Iterators

One common example is STL container iterators:

#include <vector>

int main() {
    std::vector<int> examScores {10, 14, 7, 9, 17, 0};
    long average = 0;
    for( std::vector<int>::iterator score = examScores.begin(); score != examScores.end(); ++score) {
        average += *score;
    }

    average /= examScores.size();
}
A raw pointer's lifetime is controlled by the programmer

What we've basically done here is allocate a dynamic array on the heap using STL's vector object, and asked it for a managed pointer to its beginning (examScore.begin()). Then we were able to traverse our array by incrementing our iterator, and dereferencing it to retrieve the value it points to.
These structures are managed, meaning we don't need to worry about cleaning them up after we've exited our program, unless we're storing raw pointers in our vector.

Smart Pointers

Another managed C++ pointer structure is smart pointers. Smart pointers are encapsulated pointers that manage their own lifespan during the execution of a program.

There are 3 types of smart pointers:

  • Unique pointers: allow only one owner of the pointer, and does not allow copying. Once this one owner goes out of scope, the pointer is cleaned up

  • Shared pointers: allow multiple owners, and keep track of the number of owners it has. Once that reaches 0, they clean themselves and their underlying memory by themselves

  • Weak pointers: usually used in conjunction with smart pointers, and allow for retrieving a reference to a smart pointer without incrementing the smart pointer's owner (reference) count

Visit Microsoft’s Smart Pointer documentation for more info.

Pointers are extremely powerful, and allow us to do a multitude of cool things with code. But with great power comes great responsibility, and pointer misuse can easily lead to run-time failures and errors.

References

C Plus Plus - malloc

Kernighan, B. W.; Ritchie, D. M.: The C Programming Language

Lippman, S. B.; Lajoi, J.; Moo, B. E.: C++ Primer

Smart Pointers