UTF-8 Strings in C# 11

Published on 16 Oct 2022.

In the previous blog post, we talked about character encodings. Of particular note, C#’s internal representation uses UTF-16, which is great for working with text in memory. But the Internet likes to transfer large JSON, HTML, and other files in UTF-8, since it uses fewer bytes in most cases.

That means you may find yourself needing to convert C# strings in UTF-16 encoding to a UTF-8 encoding to send across the Internet, especially if you are working in ASP.NET. (And lots of C# programs are ASP.NET web applications.)

C#’s standard library has code to convert between different encodings. So suppose you want to encode the text "HTTP/1.0", you could so something like this:

byte[] encoded = Encoding.UTF8.GetBytes("HTTP/1.0");

Unfortunately, that is quite slow to do all the time.

So some people got clever and hand-encoded it themselves, ahead of time:

ReadOnlySpan<byte> encoded = new byte[] { 0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x30 };

That’s now way faster, but virtually impossible to read. A good comment would help, but it is certainly not as clear as “HTTP/1.0”.

C# 11 added in a tool to be able to write a string as a plain string while getting the compiler to convert it to a UTF-8 encoded string for you:

var encoded = "HTTP/1.0"u8;

Notice the u8 on the end? That is what’s signaling that you want the string to be converted by the compiler to UTF-8!

I intentionally used var there to hide what is actually happening. The type is actually this:

ReadOnlySpan<byte> encoded = "HTTP/1.0"u8;

It is a ReadOnlySpan<byte>, not a string! In fact, if you had written it as string encoded = "HTTP/1.0"u8;, it would have given you a compiler error. It is not a string!

But it does make it easy to write out UTF-8-encoded chunks of memory for use in places where that is necessary and helpful, such as in a RESTful web service.

NOTE: As of the time of writing, C# 11 is not quite out yet. You may need to turn on C# 11 features for your project if you want to try this out today.