CVarViz

Visualize C variables and their pointer relationships.


Try It! →


Pointer Primer

Introduction

If you come from a language like Python, C#, or Java, you are used to garbage collection: An automatic process by which the dynamic memory your program requests from the operating system is periodically returned to the system by carefully determining what isn't accessible or in-use anymore.

But what if you're working on a video game, embedded software for a low-spec device, or performance-critical cryptography? Is garbage collection good enough?

Usually, no: Garbage collection makes program performance significantly less predictable, can have pathological behavior in some cases, and is simply an increase to your language's runtime system that you likely can't afford in settings where speed/efficient use of memory is crucial.

Some of the earliest programming languages, including C, were designed to provide a lower level of access to your computer hardware -- without requiring you to learn complex instruction set architectures (it's rare, especially these days, to encounter a system that doesn't have a C compiler to make programming more palatable).

With great power comes great responsibility, however. To interact with your machine at such a low level, you must take on the role of the garbage collector: If you ask for memory, you are responsible for cleaning it up when you're done. If you lose track of some memory you asked for, your program will 'leak' it (that is, hold on to it even if no longer needed, making it harder in the long term for other software to request memory of its own). If you forget you've released memory back to the system and try to release it a second time, your program will crash with an unhelpful "double-free" error. And, arguably worse than that, if you think some memory is valid / accessible to your program, but it's actually not, you may encounter the dreaded "Segmentation fault".

And all of this isn't to mention the myriad ways low-level controls make it harder to write software that is secure: Some very common functions in the C standard library (and I mean common), if used, leave your software susceptible to a number of the most common exploits possible.

If none of these concepts are familiar, that's OK! This primer will hopefully help, and the visualization tool is intended to be a quick way to experiment with creating variables (especially pointer variables) and see how they can be used to perform "action at a distance" on other variables. The goal is to walk away with a better understanding of pointers, and have a relatively quick way to experiment with the basics.

What is memory?

To greatly oversimplify, modern computers consist of three general (classes of) components:

The CPU is the 'brains', executing the instructions your computer is built to understand (its instruction set architecture). These instructions typically involve arithmetic, reading and writing memory, and jumping to other instructions based on certain conditions (meaning it is possible to do more interesting things than execute a precise series of instructions in order).

RAM is fast, random-access (meaning you can read/write to anywhere), short-term storage: It's what your programs use to store the data they are actively working with. A common way to think about it is as a bunch of boxes organized into a straight line, numbered starting from 0. These numbers are the memory addresses, and at each address there may be a value.

'Short-term' here just means that RAM will not persist after it loses power, unlike a hard disk or SSD. This is one of the trade-offs made to have RAM perform extremely well when it comes to read and write times.

A common way to think about memory is as a bunch of boxes organized into a straight line, numbered starting from 0. We call these numbers the addresses of the memory cells that they correspond to, and it is through these addresses that we achieve random-access when reading and writing data; at each address, we can store a value.

What is a pointer?

With that crash-course on the basics of memory out of the way, this is an easy question to answer! To repeat the mantra of my introductory computer science course (that was also a question on both the midterm and final exams):

"A pointer is a memory address."

That's really it!

Pointers in C

But how do we use pointers in C? When we write a simple C definition, such as:

          
int x = 17;
          
        

This associates the name x with a memory address, and stores the value 0 at that address. The type int tells C that the value we are storing at the address associated with x is an integer (this is important to know what operations are permissible and how much memory is necessary to store the value).

One of the greatest confusions with pointers is conflating the address and value associated with a variable. To help combat this, you can think of the variable name and address as being in one-to-one correspondence, and the variable names and values as being in a many-to-one correspondence (there is nothing wrong with multiple variables having the same value, but it's impossible for two distinct variables to have the same address).

C provides an operator to get the address associated with a variable name, known as the "address-of" operator. For example, the address of the variable x defined above is computed by writing &x.

What type of data is &x, though? Since it's a memory address, it's a pointer! But how do we write the type of a pointer in C, so we can create variables capable of storing pointer values like &x?

Enter one of the most frustrating parts of C's syntax: Pointer types. If T is a C type (such as int), then T * is the type of addresses of T-valued variables.

Where this gets fun is, you can keep adding *, so you can store address of T *-valued variables! Before getting ahead of ourselves, though, this is what it looks like to store the address of x in another variable:

          
int *y = &x;
          
        

This says that y is a variable of type int * (that is, a pointer to an integer), whose value is the address associated with the variable x.

If we try to use the value of y directly, we'll be trying to use a pointer value - while this is just some kind of integer in this basic presentation, this typically isn't allowed. Fortunately, since we know the address of something, we should be able to look it up: Enter the dereference operator!

          
int z = *y;
          
        

Here, we are defining a variable z of type int whose value is the value at the address which is the value of y. What is it's value? Well, if y is the address of x, then *y is the value at that address - in other words, the value of x.

If your head is spinning, never fear - this is precisely where the visualizer at the bottom of this page comes in handy.

Getting Started With CVarViz

CVarViz understands two kinds of inputs: Definitions, and assignment statements. The former adds or directly updates variables in the environment, and the latter allows existing variables to be updated (directly or indirectly through pointers).

To keep things simple, you can only define variables of type int (and the pointer types derived from this type).

The following can be used to define new variables:

And when writing assignments to update variables, you can use variables (or dereferenced variables) as the left-hand sides.


Points-to graph