It's dangerous to code alone! Take this.

More On Value And Reference Types

Published on 23 Apr 2022.

I sometimes hear people say, “Value types get stored on the stack and reference types get stored on the heap.” That’s a common misconception. There’s a bit of reason behind it, but reality is quite a bit more complex than that.

This post assumes you know a thing or two about classes and structs, but it could serve you well as a companion and alternate explanation of what the book covers in the Memory Management level, which is Level 14 in 4th and 5th Editions. (The book has all those fancy diagrams that I think helps shed light on what’s happening here. I didn’t have time to add more here.)

Really, the main difference is that a value type is stored right there in place while a reference type variable has two separate storage locations, one right there in place to hold a reference, and one to hold the data elsewhere. This second one will always be somewhere on the heap, but the first follows the same rules as a value type. Which I’m sure didn’t help clarify much, so let’s use some examples.

void SomeFunction(int parameter)
{
    int local = 3;
    Console.WriteLine(parameter + local);
}

C#, like the vast majority of languages out there, use a stack for allocating local variables and parameters. So when SomeFunction is called, in addition to some bookkeeping for the method call itself that also goes on the call stack, 4 bytes are reserved for parameter and 4 bytes are reserved for local. That memory is freed when the code returns from SomeFunction. So in this case, all of the data here lives on the stack, and there are no heap allocations, and nothing for the garbage collector to do, which is only focused on the heap. (The stack can take care of its own allocations easily, thank you very much.)

Let’s swap that out for a reference type instead of an int. Early on in the book, the only two reference types that are discussed are string and arrays, both of which have other mental baggage, which I want to skip for now. So I’m going to just use the Random class as an example.

void SomeFunctionWithReferences(Random parameter)
{
    Random local = new Random();
    Console.WriteLine(parameter.Next(10) + local.Next(10));
}

Now local and parameter are reference types. Enough space for two references are placed on the stack when SomeFunctionWithReferences is called, and nothing more. The size of a reference depends on the computer, but these days, most computers are 64-bit computers, and the C# runtime (the CLR) will use 64 bits or 8 bytes for each reference. So this will allocate 8 bytes to store a reference for parameter and 8 bytes to store a reference for local. The contents of that 8-byte reference for parameter is populated by the calling method (which we don’t directly see here). local, on the other hand, is actually not immediately filled with anything. There’s some random bit pattern left over in those bytes, but whatever it is is not semantically meaningful yet. And that’s the reason C# won’t let you use a variable that has not been initialized, because it contains unintelligible stuff. (I’m intentionally avoiding calling it garbage, even though that’s a good name for it, but this isn’t garbage as the C# garbage collector sees it.)

So the space for the reference for local comes into existence when the method call begins, but that’s space only for a reference and nothing more. (Same for parameter, though it gets filled in with a legitimate reference before SomeFunctionWithReferences starts running.)

However, once new Random() runs, space for the new object is allocated on the heap, because Random is a reference type, and the memory reserved for local is only big enough to hold a reference and nothing more. The Random class is pretty small, so we’re probably only talking maybe 24 bytes or so. It might even be less than that, though I’m pretty confident it is at least 12 bytes. (We could do some stuff to check and see. We actually did something similar in here a week or so ago.) Anyway, the data that a Random instance needs to do its job will live on the heap. The actual “meat” of the data will always live on the heap for a reference type, but the variable containing a reference may live on the stack, as will be the case with local.

Calling that constructor will return the reference, and local = new Random() will store that reference in local so the actual object on the heap can be hunted down when needed. For example, on the next line.

Suppose this weren’t Random but some other struct. It is a little confusing here, because early on in the book, the value types that we see are all built-in types that have fancy syntax, including things like bool, int, and float. So it may be worth looking at some arbitrary struct. Suppose we had this:

struct Point // Points are my go-to type when I need something to illustrate stuff.
{
    public float X;
    public float Y;
    public float Z;
}

And now this method:

void SomeFunctionWithStructs(Point parameter)
{
    Point local = new Point();
    Console.WriteLine(parameter.X + local.Y);
}

The structs each are going to need 12 bytes to store three 4-byte floats. So when SomeFunctionWithStructs is called, 12 bytes will be allocated on the stack for parameter and 12 bytes will be allocated on the stack will be allocated for local. Nothing is allocated on the heap.

Immediately after allocation, those 24 bytes are full of unintelligible stuff, but the 12 bytes for the parameter are populated by copying whatever was passed in as a parameter to SomeFunctionWithStructs. Meanwhile, local is still full of mystery bits, and once again, the compiler will prevent you from using any of those mystery bits until you write to that location.

But it is notable that the 12 bytes for all of the data already exist! They aren’t waiting to come into existence later. They’re there whether you use them or not. (Though if something really goes unused, the compiler is smart enough to eliminate them when compiled with optimizations, as is the default in Release mode but not Debug mode.)

In this case, when you call new Point(), you do not allocate more memory. You’re just telling the system to populate the memory location for local with data. The default constructor for structs will fill it with zeroes, so that’s what happens here. No new memory is allocated, beyond what local already had, but now the compiler knows local has been initialized and is safe to use.

Calling a constructor is not mandatory for a value type, since the memory is already there, but that’s a discussion for another time.

To wrap this up, let’s talk about arrays again.

Arrays are reference types, but they can contain other things inside them. Those other things can sometimes be value types and sometimes be reference types. For example, int[] and string[] or Random[].

void SomeFunction()
{
    int[] someNumbers;
    Random[] someRandoms;

    someNumbers = new int[] { 1, 2, 3 };
    someRandoms = new Random[] { new Random(), new Random(), new Random() };
}

When SomeFunction is called, enough space is reserved for the two local variables. They’re both reference types, so that will be space for a reference only. (Yes, even if it is an array of int. Arrays are all reference types, regardless of what type their contents are.)

The interesting part happens when you make the array instances. When the expression new int[] { 1, 2, 3 } runs, space on the heap will be allocated for the int array. There’s overhead for the array object itself, including some metadata about its type and also storing the length of the array, and then there’s space for its contents.

In this case, its contents is three int values, which is 12 total bytes, 4 per int. So there are 12 bytes right there inside the array object for its contents, and the memory whole array is allocated all at once. (If you were wondering, this is the reason arrays can’t just be dynamically resized to have more slots available.)

Once this space is allocated, then the 1, 2, and 3 values will be placed into their appropriate spots in memory.

So here, even though these ints are value types, they live within the heap, as a part of their parent, the array itself.

Contrast that with someRandoms. Once again, when new Random[] { ... } runs, space on the heap will be allocated for the Random array. There is, again, overhead for the array object itself, including the same metadata and length, and space for the array’s contents.

But in this case, the contents is not the Random instances themselves, but space for references to Random instances. They start life as null references, but space is allocated there, within the new Random[] object for three references. (And again, that’s 8 bytes on 64-bit computers, for a total of 24 allocated bytes just for the array data, not including the array overhead.)

When your program goes to populate the array, it sees those three new Random() calls, allocates yet another object on the heap for each of them, and stores references to them within someRandoms. A total of four objects allocated on the heap.

This is already really long, but I want to point out that it isn’t because we called a constructor. It is because we’re using reference types vs. value types.

If we had an array of Point structs, we’d get behavior very much like the int array.

// Using a constructor here that I didn't define earlier, but I'm sure you can imagine
// how it would be implemented.
Point[] points = new Point[] { new Point(0, 0, 0), new Point(25, 5, 0), new Point(-2, 6, 0) };

In this case, when new Point[] { ... } runs, space for the array is made, and like before, there’s some overhead including type information and length, and then space is reserved for the contents. In this case, because Point is a value type, that will be enough space for a Point, which we said earlier was 3 * 4 bytes per float, or 12 bytes per Point, and we have three of them. So a total of 36 bytes is allocated to hold all of our points. Like an int array, no other objects are made on the heap, and once again, we have value types that live on the heap, but as a part of some other object, not living on their own.

I really need to wrap this up, so two final notes in parting:

  1. It is possible to have a value type that lives on the heap without being a part of something else. That is what happens when a value type is boxed. That’s a topic for another day.
  2. The compiler knows a ton about strings that muddies the water, making it harder to understand what’s going on, but strings are reference types. Let me say that again. Strings are reference types! Structurally, they behave like the Random[]. A string[] will allocate an object with references out to each of its items. (Meanwhile, a single string itself is actually much like a char[], and since char is a value type, a single string looks more like the int[] example we saw earlier, just with char instead of int.)