November 2007
« Oct   Dec »




Module-Oriented Programming

I’ve been ruminating over the different ways in which we have structured our computer programs over the years.

Most of the world has focused on C, and it seems to organize programs into data structures and functions that operate on those structures. The best example of this practice are the Regex routines. First you transform a regex string into an internal data structure, a regex_t, via a call to regcomp(). You can then take that object and find matches in other strings using successive calls to
, report any errors using regerr(), and finally free the data structure using regfree().

This is a really effective form of encapsulation. Though the user does have to worry about freeing the structure this is no more hassle than flushing and closing a file, and is standard practice in non-garbage-collected languages, the user is nicely spared the ability to write code that depends on the layout of the regex_t data structure. The library writers can change the internal representation without affecting the users code. The user is also provided with all the functions that are convenient for operating on the mysterious data structure.

Then along came OO designs, promptly followed by the platypus effect. The root of this problem is the incestuous relationship that objects give data and methods. In fact the farther you travel down the path of objects, the closer you get to the Kingdom of Nouns. Eventually you decide that first-class functions are really, really useful, but by that time it’s far too late.

Now, as concurrency is beginning to rear its ugly head, it becomes increasingly apparent that the inheritance anomaly is inescapable, mostly because mutable state and parallelism don’t mix all that well; this problem is compounded when the data is bound up together with the methods that act on it.

In the days before OO design, we’d have data and functions, and they’d lead separated existences, and all was good. Then we found that we could get code re-use via inheritance, and logically binding the data with the methods that act on it. This gave us a means of creating scalable GUI programs, with clear and maintainable logical separation throughout the code. Inheritance afforded us a means of reducing that complexity. But, as we build larger and more complex systems, as the hardware changes underneath these designs, as the computer and network architecture changes, our languages need to adapt for new and more powerful abstractions.

So the new, and popular means of software architecture is to logically dissect the code into modules. Each module should encapsulate the data into private structures, away from end programmer manipulations. Each module should also provide the methods with act on those data structures, and scope resolution can provide the naming convention, preventing name clashes. I think that this approach provides benefits that the OO design does not, it also does not burden the programmer with contracts or other excessive verbosity. And it re-introduces the separation of data and procedure.

But how does the module-oriented paradigm stand up to the anomalies in present in OO designs? Generally we want the data-structure to be tagged with state information (using the typing system) and the methods operate only on data structures of the appropriate type (compiler examines and matches based on function prototypes).

  • History Sensitiveness of acceptable states.
  • Supposing that we wish to extend a buffer to include a gget function that should only be callable immediately after a call to put. Therefore we need to add history awareness to the data structure by splitting the partial state into two states {after_put, not_after_put}. The best approach here is to declare the two new states as subtypes of partial (hopefully the original coder thought ahead to have each data structure state represented by a unique type), then a re-implementation of put to return the after_put is all that is required. The gget method would only accept after_put, while all other functions naturally accept after_put, because it is a subtype of partial. All the other methods will return one of {partial, full, empty} none of which can be used with gget. This essentially uses the typing system as a guard.

  • Partitioning of acceptable states.
  • Supposing that we wish to add a get2 method that will pop 2 elements from the buffer. It should only be callable if the buffer contains more than 2 elements. Thus the partial state is split into {partial_1, partial_more_than_1}. Unfortunately, now every method needs to be re-implemented to accommodate the new states. In principle it’s impossible to avoid this problem. We could try this though: Alter the typing system so that for any function that accepts, modifies, then yields a data structure, for any sub-type of that structure that is passed in, the same sub-type is passed back out. Then we might be able to localize the change to affect only get and put methods. More generally though the localization affects all functions that can cause state change.

  • Modification of acceptable states.
  • Supposing now that we wish to lock the data structure from further changes. Two new methods lock and unlock can do this. Without modifying any of the other methods, we can introduce new types that the existing methods won’t accept. Thereby using the typing system as a guard. Unfortunately, any methods that don’t modify the structure should still be callable, so re-implementing all of them will be necessary. (wrapper functions are useful here)

Ok, so there’s no free lunch, and this doesn’t solve all the problems I was hoping it would. I also neglected to mention exactly how overloading and overriding would be implemented with modules and encapsulation enforced via scoping.

Next time I’ll ramble on more about the different paradigms (encapsulation, containment, and inheritance). I’ll focus on composition and it’s necessity in the world of concurrency.

Leave a Reply