It's dangerous to code alone! Take this.

Parsing Enumerations

Published on 22 May 2022.

While the book covers enumerations in quite a bit of depth, one area that I feel could use a little more elaboration is converting enumerations to and from strings.

Imagine you have this enumeration that identifies types of elixirs:

enum ElixirType { Invisibility, Strength, Regeneration, Coffee }

It is easy enough to make a variable that uses this:

ElixirType currentPurchase = ElixirType.Regeneration;

But what if you want to ask the user to pick from one of those enumeration types?

There are several ways we can approach this.

Switch-Based Parsing using Strings

One way to approach this is to just ask the user to type in their choice and use a switch to convert to an enumeration:

Console.WriteLine("What type of elixir do you want? Options are 'invisibility', 'strength', 'regeneration', and 'coffee'.");
string input = Console.ReadLine();
ElixirType currentPurchase = input switch
{
    "invisibility" => ElixirType.Invisibility,
    "strength"     => ElixirType.Strength,
    "regeneration" => ElixirType.Regeneration,
    "coffee"       => ElixirType.Coffee
};

Blam. Done. That’s it.

I’m not aware of an industry standard name for this approach, but for this post, let’s call this switch-based parsing using strings.

There’s a rather significant limitation to this approach: humans are notoriously bad at following instructions (unlike those blessed computers, which follow everything perfectly).

There are several flavors of bad data that we might want to consider here. The first is, what do we do if somebody ignores our options and tries to access the SECRET MENU? If a user enters “infinite wisdom”, you’re going to have a bad time.

Now, I do want to say that in the early parts of this book–indeed, for any small program that isn’t used by the masses–it might be okay to go light on the input validation. At the point in the book where enumerations are introduced, we don’t have all of the tools that are necessary or useful for handling input validation well. So I’m not sure it’s unreasonable to just ignore this looming problem for now, and just trust that the humans using it will be shrewd enough to use it wisely, or forgiving enough to understand when “garbage in” leads to “garbage out.”

Having said that, a lot of people still feel like they want to make an attempt at wiser input handling, and that’s probably smart of them.

Our switch expression, above, would definitely allow us to put in a default value. Perhaps the simplest thing to do here is to just make one of the options the default:

ElixirType currentPurchase = input switch
{
    "invisibility" => ElixirType.Invisibility,
    "strength"     => ElixirType.Strength,
    "regeneration" => ElixirType.Regeneration,
    "coffee"       => ElixirType.Coffee,
    _              => ElixirType.Coffee // The default--the one with the least serious side effects.
};

Here, if the user enters anything besides the allowed option, they get coffee. (And if they don’t like coffee, “No elixirs for you!”)

Another variation on the idea would be to have a specific elixir type that represents an invalid elixir type. For example, we could add in ElixirType.Invalid:

ElixirType currentPurchase = input switch
{
    "invisibility" => ElixirType.Invisibility,
    "strength"     => ElixirType.Strength,
    "regeneration" => ElixirType.Regeneration,
    "coffee"       => ElixirType.Coffee,
    _              => ElixirType.Invalid
};

This opens up the option for us to say, “Hm. Looks like they didn’t pick a valid option after all,” and try again:

ElixirType currentPurchase;
do
{
    Console.WriteLine("What type of elixir do you want? Options are 'invisibility', 'strength', 'regeneration', and 'coffee'.");
    string input = Console.ReadLine();
    currentPurchase = input switch
    {
        "invisibility" => ElixirType.Invisibility,
        "strength"     => ElixirType.Strength,
        "regeneration" => ElixirType.Regeneration,
        "coffee"       => ElixirType.Coffee,
        _              => ElixirType.Invalid
    };

    if (currentPurchase == ElixirType.Invalid)
        Console.WriteLine("That wasn't a valid option. Try again.");
}
while (currentPurchase != ElixirType.Invalid);

This is, however, a bit counterintuitive. The idea with new type definitions is that we’re identifying the set of all valid values. Enumerations are very direct about their valid values, because they list them all when the enumeration is defined. It feels a bit strange to have one of those supposed valid values be called Invalid.

Invalid is clearly not meant to be a valid option, yet there it is, defined as a part of the enumeration, making it valid!

We’re carefully guarding against unintended use of ElixirType.Invalid here, but that doesn’t guarantee we won’t accidentally use it elsewhere. (And keep in mind, if you put Invalid as the first option in an enumeration, it becomes the default value, which is another way for it to sneak in unintentionally.)

So it isn’t great, but it does work. We’ll look at some other options in a moment.

But before we do, let’s look at another nuance. One thing I’ve seen a lot of readers do is try to at least remove the question of capitalization. I think this is a wise line of thinking. While we’ll potentially suffer from people entering garbage (which we solved above with the Invalid option), it would be nice to be forgiving about capitalization. That is, we said 'strength' was an option, but if they typed in Strength, perhaps we should allow that.

The simplest way to deal with that is to just include another arm to the switch:

        // ...
        "strength"    => ElixirType.Strength,
        "Strength"    => ElixirType.Strength
        // ...

That will cover both strength and Strength, which are the two most common ways that word might be capitalized. But what about STRENGTH? What about STrength? If you don’t care about capitalization at all, one option is to take the input from the user and call ToLower() or ToUpper() on it, to get a whole new string that is all lowercase or all uppercase:

Console.WriteLine("What type of elixir do you want? Options are 'invisibility', 'strength', 'regeneration', and 'coffee'.");
string input = Console.ReadLine().ToLower(); // Lowercase-ify it here.
ElixirType currentPurchase = input switch
{
    "invisibility" => ElixirType.Invisibility,
    "strength"     => ElixirType.Strength,
    "regeneration" => ElixirType.Regeneration,
    "coffee"       => ElixirType.Coffee
};

Now this code will treat all capitalization variants the same. That may not always be what you want, but it is often reasonable.

Switch-Based Parsing using Numbers

One of the problems with the above approach is that we’re dealing with strings–which is an extremely broad set of potential values that we have to consider.

If we were dealing with a graphical interface, the options would probably be presented in a dropdown menu, and there’d be no possible way to enter a bad choice. Given that we’re currently limited to the console window, all input will, ultimately and unfortunately, be strings.

However, if we convert user input to an integer first, we can open up some extra options. (Note that I’m not trying to say that this way is better, just that it is different.)

Instead of saying, “Here’s the options, type in text that matches one of these,” we can assign a number to each:

int input;
while (true)
{
    Console.WriteLine("What type of elixir do you want?");
    Console.WriteLine("1 - Invisibility");
    Console.WriteLine("2 - Strength");
    Console.WriteLine("3 - Regeneration");
    Console.WriteLine("4 - Coffee");
    Console.Write("> ");

    input = Convert.ToInt32(Console.ReadLine());
    if (input >= 1 && input <= 4)
        break;
    else
        Console.WriteLine("That wasn't a valid option. Try again.");
}

ElixirType currentPurchase = input switch
{
    1 => ElixirType.Invisibility,
    2 => ElixirType.Strength,
    3 => ElixirType.Regeneration,
    4 => ElixirType.Coffee
};

With this version, once we leave the loop, we know we’ve got a valid option. So our switch does not need to handle any other case–it is safe. Furthermore, we can tell people they didn’t enter a valid option without needing to add in an ElixirType.Invalid option, which is a nice bonus.

This approach isn’t foolproof, though. If a human enters a non-integer, things will still blow up. Convert.ToInt32 is a great tool, but it does demand that the text it is given is actually valid. At this part of the book, we don’t have a full arsenal of tools available to us, so we’ll just live with this for now. But later on, we’ll learn about the TryParse methods, which solve this problem fully, and allow us to better handle all the garbage a human might throw at us. (You could probably jump ahead to the section on “Output Parameters” and learn it now, if you want.)

Generalized Approaches

The above approaches require us to write out all of the options–twice. Once to list the options and once when doing the conversions.

It suffers from two problems:

  1. It isn’t robust to change. If you need to add or remove an item, you’ve got to revisit both of those places to revise the list.
  2. It is a lot of manual work! I initially had six options in my ElixirType enumeration and intentionally shortened it so that I wouldn’t have to do as much work for this post. But what if your enumeration had 24 options in it? With the above approaches, you’d need to type them all out, twice.

These two limitations leave a lot of people wondering if there isn’t a better way to do it.

The answer is, yes there is. The bad news is that both deal with tools that we’re, perhaps not 100% ready for. The first approach is with the Type class. The second uses generics. Both of those topics are covered in far more depth later on in the book.

Yet… perhaps it is worth a discussion now to talk about those options, even if we’re getting a little ahead of ourselves. (Both of these topics deserve much more attention, but the specific way we’ll use them here will be intuitive enough, and it is probably okay for us to start using them tentatively, even before we’ve totally mastered them.)

The Generalized Approach with the Type Class

There is a class called Enum–similar to Console and Convert–that contains a bunch of methods for working with enumerations. This Enum class gives us some methods that let us do conversion of enumerations to strings and strings to enumerations. The gotcha is that there isn’t one enumeration type, but many. And the Enum class needs to handle them all. That requires some extra hoop jumping to make it work. You may not fully understand all of this hoop jumping right now, but it will hopefully make enough sense that you can work through it from intuition anyway.

Enum has a method called GetNames that returns a string[] with a string representation of all of the enumeration values of a specific enumeration. The way you indicate the specific enumeration type that you want is by giving it a Type object. The easiest way to do that is with the typeof operator. I’m not even going to really bother explaining Type objects and typeof here, other than to show an example. You can get all of the stringified versions of the ElixirType enumeration with this:

string[] elixirNames = Enum.GetNames(typeof(ElixirType));

We could use that to generate the list of options without having to type them all out ourselves:

string[] elixirNames = Enum.GetNames(typeof(ElixirType));
string options = "";
foreach (string elixirName in elixirNames)
    options += $" '{elixirName}'";
Console.WriteLine("What type of elixir do you want? Options are" + options + ".");

The loop generates the options for us! We don’t need to revisit this code if we add or remove an option. That’s pretty nice, right?

That particular version is missing a few niceties that our earlier, hand-crafted version had. For example, our ealier version had commas separating the items, and an "and" before the last. We could modify this code to support that if we wanted. But I’m going to skip that here.

Once they make their choice, we need to convert their text back to an enumeration option, but there’s an Enum method for that as well: Enum.Parse:

ElixirType currentPurchase = (ElixirType)Enum.Parse(typeof(ElixirType), Console.ReadLine());

There’s a lot going on in there. Let’s start in the middle. Enum.Parse requires two parameters: the type you’re working with and the text to parse. Like before, we use typeof(ElixirType) to indicate the type we want. But Enum.Parse doesn’t know what to return, so it returns the most general-purpose type: object. We fix that by casting from object to ElixirType, and away we go.

One catch: this version demands an exact match in terms of capitalization. So if you type in "Strength" it will work, but if you type in "strength", it will not.

There’s a second variation of this method that has a third parameter that allows you to indicate if you want to ignore the case when comparing. We can use that and specify that we do, in fact, want to ignore case, to give our users a bit more flexibility:

ElixirType currentPurchase = (ElixirType)Enum.Parse(typeof(ElixirType), Console.ReadLine(), true);

Now both "Strength" and "strength", as well as "STrENGTH" will all be turned into ElixirType.Strength.

While we’ve dealt with only the text-based input approach, you can probably figure out how you could make a menu where you pick a number using this same approach.

The Generalized Approach with Generics

Aside from using the Type class, there’s another way. This is actually the preferred way, though it uses generics, which are a bit more complicated. Generics are covered in a lot of depth later–we’re running well ahead of ourselves right now (assuming you’re reading this at about the time you’re in the Enumerations level). So don’t worry too much if you don’t understand it all–you’re not supposed to yet. But the pattern is simple enough that it can serve as a gentle introduction to generics (without bothering to really explain it at all here).

We used the version of GetNames and Parse that require a Type object. But the forms we’ll see here ditch those extra parameters in favor of generics.

I’m just going to throw the new code in front of you here, but I think it will be intuitive enough:

string[] elixirNames = Enum.GetNames<ElixirType>();
string options = "";
foreach (string elixirName in elixirNames)
    options += $" '{elixirName}'";
Console.WriteLine("What type of elixir do you want? Options are" + options + ".");
ElixirType currentPurchase = Enum.Parse<ElixirType>(Console.ReadLine(), true);

The main difference, there, is using Enum.GetNames<ElixirTypes>() instead of Enum.GetNames(typeof(ElixirTypes)) and Enum.Parse<ElixirType>(Console.ReadLine(), true) instead of (ElixirType)Enum.Parse(typeof(ElixirType), Console.ReadLine(), true). Note, especially, that we don’t need to cast with this generics-based version, which is a nice plus.

TryParse

As I mentioned earlier when we were talking about parsing integers, the Enum class also has some TryParse methods in addition to the Parse methods. Those are more robust–they don’t crash on bad input–but also demand using output parameters, which is a somewhat more advanced concept. I’ll mention their existence here without showing them, but when you get to the part in the book that deals with output parameters and TryParse, just keep in mind that Enum has similar capabilities for parsing enumerations.

A Downside to the Generalized Approach

Now that you’ve seen the generalized approaches, you might be thinking you’ll never go back. But there’s at least one place where this generalized approach has limits: the text used in the user interface reflects exactly what the code shows.

For the elixir type options we’ve had so far, that hasn’t been very problematic. But there are scenarios where it does have problems.

A simple example is a multi-word identifier. In code, we can’t use a space, so we typically use CamelCasing to name our enumeration member: ElixirType.UnderwaterBreathing or ElixirType.GoldenDragonsBreath, etc. That works fine in code, but these generalized approaches will simply mirror those. Our users will be shown UnderwaterBreathing and GoldenDragonsBreath, and will need to type it in that way, without the spaces.

We could get clever and add spaces before capital letters and cut out whitespace from user input. But the more complex the names the harder this is to do. For example, there’s really supposed to be an apostrophe in "Golden Dragon's Breath". It would be tricky to come up with a set of rules that will add that apostrophe in in the general sense. (How would your code even know that a word ending in s is meant to be possessive (“Dragon’s”) vs. plural (“Dragons”) vs. plural and possessive (“Draons’”)? It is not easy!).

Even if you could add in enough finesse to handle all enumerations you care about, exposing the word the programmers use for some concept might not even be what you want to expose to the user. For example, you might have two algorithms that do something: the slow but high-quality approach called the Huygens-Steinbacher-Surry algorithm and the fast low-quality approach called the Floopy Schloopy algorithm that Reddit named. You might represent these in an enumeration–enum ConversionAlgorithm { HuygensSteinbacherSurry, FloopySchloopy }. Now good luck asking the user to pick between those options. Sometimes, we just simply need human control to say, “Pick an option: ‘high quality but slow’ or ‘low quality but fast’.” The text we display to the user is totally unrelated to what the representation within the code is. And while the two worlds often align, assuming they always will is a bad idea.

We have similar issues when we want to support multiple languages. Our code can only represent things with a single identifier (typically but not always in English), but if we want to support English, Spanish, and Mandarin, we won’t be able to just show the raw enumeration member names. We’ll need to do translations.

For me, Enum.GetNames and Enum.Parse feels mass produced, and not careful enough, and I tend to use the earlier approaches so I can add that extra bit of finesse around the text I’m displaying to the user. But it definitely takes more time, and there is absolutely a place for both options in the world of programming.