Heterogenous Lists

I’m used to C’s version of unions and structs. In C a union is simply a spot of memory into which various types of things can be stored. C doesn’t do all that much checking on the data types though. For example, if you construct a union of an 4-byte int and a 4-byte double. C will allow you to assign to the int and then later read from the double. Because these values share the same memory space, you will very likely get back garbage (the bytes of the int will be interpreted as if they were actually a double). C essentially allows you to subvert the type system in this fashion, leaving it up to the (unreliable) programmer to keep track of the type which is stored in that location, so that those bits are not ever used as if they represented a different type. C++ inherits this feature, making it especially dangerous, because the programmer might accidentally pull out the wrong class, and thus use those bits for a nonsensical dynamic method resolution.

What would be really nice is to have a union type, which the compiler is able to check for you. According to section 7.3 Records (Structures) and Variants (Unions) of Programming Language Pragmatics, Ada requires the use of a discriminant in its variants. Essentially, this means that the compiler is able to keep track of what type of value that part of memory represents. It also disallows converting of types. Once a value of type A is stored in that location, it will always be treated as type A, and never as a type B. This tag uses up extra memory, but does help provide type safety.

Also, in C, there is the array. Essentially a packed section of memory that contains many of the same type of object. It’s homogeneous. Only one type can be stored in it. Since I started using some of the more dynamic scripting languages, I’ve always found homogeneous data structures to be quite constraining. It’s just so inflexible to store a list of only one kind of thing. Many times I like to store a list (preserves order) of different kinds of things. I like heterogeneous lists. But how to implement them?

In C we can easily create a union of all the different types we might want to put in the list, and then just put them in like that. But wait! When we pull them out, we’ve now forgotten what type we used when we put them in. So we could create a struct with a tag (to tell us the type) and a union to store the data. Then every time we pull a value out of the array, we first look and the tag, and then use the appropriate data type. The compiler will be in the dark about most of this, and won’t catch certain mistakes (namely, pulling a value out as the wrong type, or omitting a type*)

In my dream language, I definitely want the compiler to type check my unions, such that I don’t accidentally try to pull out data with a different type than was used when I put it in. In this way I would be able to safely and easily create a heterogeneous list by simply creating a homogeneous list of unions. There will probably be a large run-time type-checking or tag-verifying code that would be unavoidable, but I’d like to make the analysis as static as possible for performance and safety reasons. I’m hoping that a typing system with powerful enough type-inference could help me out with that a bit.


* Actually, if the tag is an enum then C++ will verify that any switch-case exhaustively checks the enum. So omitting a type should raise a compiler warning.