More On Value And Reference Types
I sometimes hear people say, “Value types get stored on the stack and reference types get stored on the heap.” That’s a common misconception. There’s a bit of reason behind it, but reality is quite a bit more complex than that.
This post assumes you know a thing or two about classes and structs, but it could serve you well as a companion and alternate explanation of what the book covers in the Memory Management level, which is Level 14 in 4th and 5th Editions. (The book has all those fancy diagrams that I think helps shed light on what’s happening here. I didn’t have time to add more here.)
Really, the main difference is that a value type is stored right there in place while a reference type variable has two separate storage locations, one right there in place to hold a reference, and one to hold the data elsewhere. This second one will always be somewhere on the heap, but the first follows the same rules as a value type. Which I’m sure didn’t help clarify much, so let’s use some examples.
void SomeFunction(int parameter)
{
int local = 3;
Console.WriteLine(parameter + local);
}
C#, like the vast majority of languages out there, use a stack for allocating local variables and parameters. So when SomeFunction
is called, in addition to some bookkeeping for the method call itself that also goes on the call stack, 4 bytes are reserved for parameter
and 4 bytes are reserved for local
. That memory is freed when the code returns from SomeFunction
. So in this case, all of the data here lives on the stack, and there are no heap allocations, and nothing for the garbage collector to do, which is only focused on the heap. (The stack can take care of its own allocations easily, thank you very much.)
Let’s swap that out for a reference type instead of an int
. Early on in the book, the only two reference types that are discussed are string
and arrays, both of which have other mental baggage, which I want to skip for now. So I’m going to just use the Random
class as an example.
void SomeFunctionWithReferences(Random parameter)
{
Random local = new Random();
Console.WriteLine(parameter.Next(10) + local.Next(10));
}
Now local
and parameter
are reference types. Enough space for two references are placed on the stack when SomeFunctionWithReferences
is called, and nothing more. The size of a reference depends on the computer, but these days, most computers are 64-bit computers, and the C# runtime (the CLR) will use 64 bits or 8 bytes for each reference. So this will allocate 8 bytes to store a reference for parameter
and 8 bytes to store a reference for local
. The contents of that 8-byte reference for parameter
is populated by the calling method (which we don’t directly see here). local
, on the other hand, is actually not immediately filled with anything. There’s some random bit pattern left over in those bytes, but whatever it is is not semantically meaningful yet. And that’s the reason C# won’t let you use a variable that has not been initialized, because it contains unintelligible stuff. (I’m intentionally avoiding calling it garbage, even though that’s a good name for it, but this isn’t garbage as the C# garbage collector sees it.)
So the space for the reference for local
comes into existence when the method call begins, but that’s space only for a reference and nothing more. (Same for parameter
, though it gets filled in with a legitimate reference before SomeFunctionWithReferences
starts running.)
However, once new Random()
runs, space for the new object is allocated on the heap, because Random
is a reference type, and the memory reserved for local
is only big enough to hold a reference and nothing more. The Random
class is pretty small, so we’re probably only talking maybe 24 bytes or so. It might even be less than that, though I’m pretty confident it is at least 12 bytes. (We could do some stuff to check and see. We actually did something similar in here a week or so ago.) Anyway, the data that a Random
instance needs to do its job will live on the heap. The actual “meat” of the data will always live on the heap for a reference type, but the variable containing a reference may live on the stack, as will be the case with local
.
Calling that constructor will return the reference, and local = new Random()
will store that reference in local
so the actual object on the heap can be hunted down when needed. For example, on the next line.
Suppose this weren’t Random
but some other struct. It is a little confusing here, because early on in the book, the value types that we see are all built-in types that have fancy syntax, including things like bool
, int
, and float
. So it may be worth looking at some arbitrary struct. Suppose we had this:
struct Point // Points are my go-to type when I need something to illustrate stuff.
{
public float X;
public float Y;
public float Z;
}
And now this method:
void SomeFunctionWithStructs(Point parameter)
{
Point local = new Point();
Console.WriteLine(parameter.X + local.Y);
}
The structs each are going to need 12 bytes to store three 4-byte float
s. So when SomeFunctionWithStructs
is called, 12 bytes will be allocated on the stack for parameter
and 12 bytes will be allocated on the stack will be allocated for local
. Nothing is allocated on the heap.
Immediately after allocation, those 24 bytes are full of unintelligible stuff, but the 12 bytes for the parameter are populated by copying whatever was passed in as a parameter to SomeFunctionWithStructs
. Meanwhile, local
is still full of mystery bits, and once again, the compiler will prevent you from using any of those mystery bits until you write to that location.
But it is notable that the 12 bytes for all of the data already exist! They aren’t waiting to come into existence later. They’re there whether you use them or not. (Though if something really goes unused, the compiler is smart enough to eliminate them when compiled with optimizations, as is the default in Release mode but not Debug mode.)
In this case, when you call new Point()
, you do not allocate more memory. You’re just telling the system to populate the memory location for local
with data. The default constructor for structs will fill it with zeroes, so that’s what happens here. No new memory is allocated, beyond what local
already had, but now the compiler knows local
has been initialized and is safe to use.
Calling a constructor is not mandatory for a value type, since the memory is already there, but that’s a discussion for another time.
To wrap this up, let’s talk about arrays again.
Arrays are reference types, but they can contain other things inside them. Those other things can sometimes be value types and sometimes be reference types. For example, int[]
and string[]
or Random[]
.
void SomeFunction()
{
int[] someNumbers;
Random[] someRandoms;
someNumbers = new int[] { 1, 2, 3 };
someRandoms = new Random[] { new Random(), new Random(), new Random() };
}
When SomeFunction
is called, enough space is reserved for the two local variables. They’re both reference types, so that will be space for a reference only. (Yes, even if it is an array of int
. Arrays are all reference types, regardless of what type their contents are.)
The interesting part happens when you make the array instances.
When the expression new int[] { 1, 2, 3 }
runs, space on the heap will be allocated for the int
array.
There’s overhead for the array object itself, including some metadata about its type and also storing the length of the array, and then there’s space for its contents.
In this case, its contents is three int
values, which is 12 total bytes, 4 per int
.
So there are 12 bytes right there inside the array object for its contents, and the memory whole array is allocated all at once.
(If you were wondering, this is the reason arrays can’t just be dynamically resized to have more slots available.)
Once this space is allocated, then the 1
, 2
, and 3
values will be placed into their appropriate spots in memory.
So here, even though these int
s are value types, they live within the heap, as a part of their parent, the array itself.
Contrast that with someRandoms
.
Once again, when new Random[] { ... }
runs, space on the heap will be allocated for the Random
array.
There is, again, overhead for the array object itself, including the same metadata and length, and space for the array’s contents.
But in this case, the contents is not the Random
instances themselves, but space for references to Random
instances.
They start life as null references, but space is allocated there, within the new Random[]
object for three references. (And again, that’s 8 bytes on 64-bit computers, for a total of 24 allocated bytes just for the array data, not including the array overhead.)
When your program goes to populate the array, it sees those three new Random()
calls, allocates yet another object on the heap for each of them, and stores references to them within someRandoms
.
A total of four objects allocated on the heap.
This is already really long, but I want to point out that it isn’t because we called a constructor. It is because we’re using reference types vs. value types.
If we had an array of Point
structs, we’d get behavior very much like the int
array.
// Using a constructor here that I didn't define earlier, but I'm sure you can imagine
// how it would be implemented.
Point[] points = new Point[] { new Point(0, 0, 0), new Point(25, 5, 0), new Point(-2, 6, 0) };
In this case, when new Point[] { ... }
runs, space for the array is made, and like before, there’s some overhead including type information and length, and then space is reserved for the contents.
In this case, because Point
is a value type, that will be enough space for a Point
, which we said earlier was 3 * 4 bytes per float
, or 12 bytes per Point
, and we have three of them.
So a total of 36 bytes is allocated to hold all of our points.
Like an int
array, no other objects are made on the heap, and once again, we have value types that live on the heap, but as a part of some other object, not living on their own.
I really need to wrap this up, so two final notes in parting:
- It is possible to have a value type that lives on the heap without being a part of something else. That is what happens when a value type is boxed. That’s a topic for another day.
- The compiler knows a ton about strings that muddies the water, making it harder to understand what’s going on, but strings are reference types. Let me say that again. Strings are reference types! Structurally, they behave like the
Random[]
. Astring[]
will allocate an object with references out to each of its items. (Meanwhile, a singlestring
itself is actually much like achar[]
, and sincechar
is a value type, a single string looks more like theint[]
example we saw earlier, just withchar
instead ofint
.)