eric the fruitbatBlog
Sounding out the Noosphere.

Posts from November, 2007

The Three types of Existence

Posted by Eric Hennigan
On November 26th, 2007 at 21:11

Permalink | Trackback | Links In |

Comments (3) |
Posted in Ideas, Philosophy, Politics, Strange Loop

I know that I promised a post about process composition and software architecture, but this just can’t wait. Last night I had a really nice conversation with one of my high school friends about the nature of existence. It all started off with my unsubstantiated claim that “the self does not exist.” As a philosophy major, he immediately started grilling me about what I meant by ’self’, and then, as we descended into an argument over semantics, what I meant, by ‘exist’. Unlike most semantic arguments, this one actually had a beneficial outcome. Together, we discovered that there are at least three different types of existence.

  • Physical Existence
  • This is the existence of all physical objects; cars, planes, computer hardware, chairs, tables, bodies, etc. One aspect of the nature of physical objects is that they cannot be exactly duplicated. (well, technically subatomic stuff like a photon can, but my arguments are all about the macroscopic realm). You see, when a physical object, such as a car, is duplicated you end up with two different cars. You do not end up with two instances of a single car. That is, physical objects cannot share identity.

  • Virtual Existence
  • This is the existence of all virtual objects; computer software, digital content, the conscious self, etc. These objects have an entirely different nature of existence from object in the physical realm. These objects can potentially be copied (to enough precision) that we’d call the result identical. Not only that but, virtual objects are conceptually removed from the physical media used for their instantiation. If I make two copies of a movie onto two different physical CD’s, it’s still termed the same movie, they share identity (with respect to the interpreter that defines the virtual objects). My conscious self must be a virtual object, because I can survive the Dan Dennett teleporter.

  • Platonic Existence
  • Quite naturally, this is the realm of Ideas. We know that Ideas are an existence level removed from the virtual realm because a single Idea can have multiple virtual instantiations. That is, in order to communicate an idea, it is first instantiated into words/diagrams/language, but if I can still communicate the same idea via a different language or different wording.

Note that the very essence of this distinction is predicated on whether or not we are inclined to apply identity to two different instantiations of a thing. I therefore say there are at least three different types of existence, because I wish to leave open the possibility of a whole hierarchy of existences like a hierarchy of types. In fact the virtual realm seems particularly flexible because of its dependence on an interpreter (is it DeCSS or a prime number?). In summary a Platonic Idea can have multiple instantiations in the virtual realm, yet retain identity; while a virtual object can have multiple instantiations in the physical realm, and still retain identity.

We should also recognize that Plato would never have thought up the idea of a Virtual realm in between that of the Physical and Platonic, because there were no computers in his day. This is only one way in which the creation and cultural acceptance of a computer has fundamentally changed the fundamental philosophy with which we view and analyze the world. The virtual realm is really very new, and mostly unexplored (at least in a legal sense). In fact there are still many of us that are all too willing to take the laws formulated about one realm (ownership of physical objects) and apply it to another (digital objects) without first examining the very existential nature of those objects. We’ve really got to get our act together, or we risk imposing on the virtual world those limitations present in the physical world.

I unfortunately haven’t seen much in terms of exploration of these three different types of existence, but then I just really discovered the difference yesterday. (well, I always knew that there was a difference, and that it was based somewhere on the ability to generate copies, but I didn’t recognize that that was the real fundamental difference, the very essence of existence in each of the realms).

See also:
The Digital Imprimatur
The Right to Read

Module-Oriented Programming

Posted by Eric Hennigan
On November 22nd, 2007 at 20:11

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Comp*, Language

I’ve been ruminating over the different ways in which we have structured our computer programs over the years.

Most of the world has focused on C, and it seems to organize programs into data structures and functions that operate on those structures. The best example of this practice are the Regex routines. First you transform a regex string into an internal data structure, a regex_t, via a call to regcomp(). You can then take that object and find matches in other strings using successive calls to
regexec()
, report any errors using regerr(), and finally free the data structure using regfree().

This is a really effective form of encapsulation. Though the user does have to worry about freeing the structure this is no more hassle than flushing and closing a file, and is standard practice in non-garbage-collected languages, the user is nicely spared the ability to write code that depends on the layout of the regex_t data structure. The library writers can change the internal representation without affecting the users code. The user is also provided with all the functions that are convenient for operating on the mysterious data structure.

Then along came OO designs, promptly followed by the platypus effect. The root of this problem is the incestuous relationship that objects give data and methods. In fact the farther you travel down the path of objects, the closer you get to the Kingdom of Nouns. Eventually you decide that first-class functions are really, really useful, but by that time it’s far too late.

Now, as concurrency is beginning to rear its ugly head, it becomes increasingly apparent that the inheritance anomaly is inescapable, mostly because mutable state and parallelism don’t mix all that well; this problem is compounded when the data is bound up together with the methods that act on it.

In the days before OO design, we’d have data and functions, and they’d lead separated existences, and all was good. Then we found that we could get code re-use via inheritance, and logically binding the data with the methods that act on it. This gave us a means of creating scalable GUI programs, with clear and maintainable logical separation throughout the code. Inheritance afforded us a means of reducing that complexity. But, as we build larger and more complex systems, as the hardware changes underneath these designs, as the computer and network architecture changes, our languages need to adapt for new and more powerful abstractions.

So the new, and popular means of software architecture is to logically dissect the code into modules. Each module should encapsulate the data into private structures, away from end programmer manipulations. Each module should also provide the methods with act on those data structures, and scope resolution can provide the naming convention, preventing name clashes. I think that this approach provides benefits that the OO design does not, it also does not burden the programmer with contracts or other excessive verbosity. And it re-introduces the separation of data and procedure.

But how does the module-oriented paradigm stand up to the anomalies in present in OO designs? Generally we want the data-structure to be tagged with state information (using the typing system) and the methods operate only on data structures of the appropriate type (compiler examines and matches based on function prototypes).

  • History Sensitiveness of acceptable states.
  • Supposing that we wish to extend a buffer to include a gget function that should only be callable immediately after a call to put. Therefore we need to add history awareness to the data structure by splitting the partial state into two states {after_put, not_after_put}. The best approach here is to declare the two new states as subtypes of partial (hopefully the original coder thought ahead to have each data structure state represented by a unique type), then a re-implementation of put to return the after_put is all that is required. The gget method would only accept after_put, while all other functions naturally accept after_put, because it is a subtype of partial. All the other methods will return one of {partial, full, empty} none of which can be used with gget. This essentially uses the typing system as a guard.

  • Partitioning of acceptable states.
  • Supposing that we wish to add a get2 method that will pop 2 elements from the buffer. It should only be callable if the buffer contains more than 2 elements. Thus the partial state is split into {partial_1, partial_more_than_1}. Unfortunately, now every method needs to be re-implemented to accommodate the new states. In principle it’s impossible to avoid this problem. We could try this though: Alter the typing system so that for any function that accepts, modifies, then yields a data structure, for any sub-type of that structure that is passed in, the same sub-type is passed back out. Then we might be able to localize the change to affect only get and put methods. More generally though the localization affects all functions that can cause state change.

  • Modification of acceptable states.
  • Supposing now that we wish to lock the data structure from further changes. Two new methods lock and unlock can do this. Without modifying any of the other methods, we can introduce new types that the existing methods won’t accept. Thereby using the typing system as a guard. Unfortunately, any methods that don’t modify the structure should still be callable, so re-implementing all of them will be necessary. (wrapper functions are useful here)

Ok, so there’s no free lunch, and this doesn’t solve all the problems I was hoping it would. I also neglected to mention exactly how overloading and overriding would be implemented with modules and encapsulation enforced via scoping.

Next time I’ll ramble on more about the different paradigms (encapsulation, containment, and inheritance). I’ll focus on composition and it’s necessity in the world of concurrency.

The Assembly Programmer From Hell

Posted by Eric Hennigan
On November 15th, 2007 at 23:11

Permalink | Trackback | Links In |

Comments (1) |
Posted in Bio, Ideas

Today I was reading chapter 3 (Revenge of the Nerds) in Steven Pinker’s book How the Mind Works. It talks about evolution and Darwin and Dawkins, and got me thinking about how replicators are all coded with DNA, and how this coding interacts with the processes of evolution; specifically, mutation and inheritance. It already widely known that human DNA (among the largest as measured in number of codons) is filled with ‘junk’. Well, how exactly would ‘junk DNA’ affect the processes of evolution?

Current theory says that the junk is accrued via viral payloads infecting, but not destroying, cellular hosts. That means that the organisms which have undergone more evolution ought to have increasingly higher percentages of ‘junk DNA’. I have no idea if this is actually supported by evidence, but it would be cool to know. In fact just a simple graph of ‘percentage junk DNA’ vs. ‘total amount of DNA’ for all the organisms (colored by class: mammal/bird/reptile/fish/plant) would be a cool thing to observe. But this is an aside to my primary thought.

I had the idea that all the space devoted to the junk sequences might just be used as RAM or swap space to the evolutionary processes of mutation and crossover. Ample space in this regard might really cut down on the error rate a programmer like evolution would incur. So all that ‘junk DNA’ probably has beneficial side-effects for the species, though not for any individual involved. Because, lets face it, Evolution produces the worst spaghetti code there is, it has no concept of over-all design, ignores engineering principles of simplicity and elegance, conflates uses and meaning of the code by reading forwards, backwards, and inverse, works only at the lowest level, and isn’t goal-directed in any meaningful sense of the term. In the end it just goes with what ever works, independent of how it arrived at the solution. (which explains the title of this post)

But seriously, the junk DNA probably has some functional purpose. Does it provide padding so that some genes line up along preferred cross-over points? Does it provide the ability to swap in/out entire genes during cross-over, instead of splicing only at the sub-gene level?

Random Ruminations

Posted by Eric Hennigan
On November 6th, 2007 at 22:11

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Comp*, Education, Ideas, Idiocracy, Punditry, Self

My thoughts are as entropic as the leaves falling off the trees this season (just about as colored too).

  1. Graduate School Applications
  2. <rant>I really dislike forms; actually I frothingly detest them. I don’t know why, but I have always hated pushing paper. Writing down all that information about myself, just so that it can populate some fields in somebody’s (probably insecure and unorganized) database. It’s disgusting. I already know my name, rank, and serial number; repeating it 100 times won’t change that.</rant>

    Seriously though, I don’t really get graduate applications. I understand that all Universities receive more applicants than they have positions, so they are forced to make selection. But basing that decision on something as scant as a couple pages authored by the idealistic applicant corroborated with a half-page letter from three people chosen because they 1. like/friends with the applicant, 2. applicant believes they will give good testimony, and 3. that this will sway the minds of the decision committee; just seems like an injustice. I don’t think that there’s enough information that can be crammed into that paper that would justify a mutual commitment of 6-8 years and several hundred thousand dollars on equipment, time, education, etc.

    I really believe that having a screening process closer to that of major businesses would be more effective. I should be able to submit a brief resume and slowly work my way through phone and personal interviews. I don’t mind having to provide contacts that would need to give their opinion about my qualifications, nor do I think they would mind being contacted about such. I feel that due to the extended nature of this approach, it would result in much greater knowledge on the part of both parties. Graduate schools have a much clearer idea of what subject focus each student has/needs and the student would have a clearer idea of what was expected of them at that department. Even those facing rejection would benefit, as they’d stand a better chance of picking up on a few things that they would otherwise be missing should they try again.

    I know that this is far too much work and time to expect an ordinary faculty member given the current demands of research, papers, and grant proposals (and would be a waste given their pay scale); but this work could be farmed out to those seeking tenure, assistant profs, post-docs, or current grad students. Optionally, I can see universities outsourcing this effort (but I ultimately think that’d be a disservice to everyone involved). It could also be set up so that the lower ranking candidates get shuffled into a curriculum more suited to their demeanor, as I hear Germany does with its trade schools. (Pushing the non-academic into a theory class just doesn’t work out that well, it doesn’t play to the hands-on talent of the individual)

    Ultimately, even if this interview-centric approach was taken up, I’d still rant about the foibles its implementation, yet I’d probably be better off. Ideally, I would wish for a system that removes all the subjective assessment that goes into hiring/recruitment decisions; but I don’t think that’s viable in the real world. I’d also like the current system to be much more flexible; classes should be more interchangeable among universities, so that moving from one school to another is no more difficult than moving to another group within the same company. Erdös really takes the cake on mobility.

  3. Side Project
  4. For a long time now I’ve been wanting to make a self-serving altruistic contribution to the Free Software movement, and the world at large, by writing code that others would find useful. I’ve landed on 3 such ideas, but am currently having a tough time choosing. Inspired by the pet-project rule at Google, I’m going to devote every other friday to one of these activities regardless of approval by existing management. (That’s right: I’m going rogue!)

    • Pdf Editor
    • Several times at work I have found myself in need of a pdf editor. I know that pdf is supposed to be a finalized, published document; yet when you don’t have access to the original, it’s nice to be able to edit the copy you do have. I hear that the format is pretty ghastly for an editor to deal with though. Personally, it’s also a low-priority need, so I probably won’t do this. Still, I know that the community could really use one, and it would make a killer addition to KDE.

    • Kapers
    • I recently discovered that I’ve collected enough academic articles that I’m in need of an actual organizational system if I’m ever going to remember what I’ve been reading. Over in the land of Apple this is a solved problem, and everyone can have their own Personal Library of Science. Well, since I can’t reasonably kicks the habit of reading peer reviewed articles (esp. considering my career change back into academia), I should probably help out fellow researchers by coding up a clone for KDE. This has the fringe benefit of helping out a community which I believe provides the hidden force on which the world turns. This seems to be closest to my abilities as a programmer right now.

    • Stock Market AI
    • This one has been thought of before, but that doesn’t make it any less appealing. It’s a great opportunity to learn from a wide diversity of subjects: Statistics, Mathematical Finance, Pattern Recognition, Machine Learning, AI, software architecture, etc…and there’s certainly no shortage of data to work with. If I actually succeed (and some have) then I can pour that money into other things. I’d personally love to guarantee funding for several important scientific endeavors: Immortality (yes, I share the vision propounded by Aubrey de Grey and Ray Kurzweil) or Nuclear Fusion (it’s only been 20 years away for the last 40 years) or general AI (We’re close already). I’ll probably hold off on this one, saving it for a side-project when I’m in graduate school.

OOCP

Posted by Eric Hennigan
On November 4th, 2007 at 16:11

Permalink | Trackback | Links In |

Comments (1) |
Posted in Comp*, Language

Today I read Analysis of Inheritance Anomaly in Object-Oriented Concurrent Programming Languages, and it seems that the idea that I had last week about implementing every object instance as it’s own process has already been considered. As I expected, this attempt to marry principles of inheritance and polymorphism within a framework of concurrency is fraught with problems. Most of the problems are related to state. It’s relatively easy to make each method a ‘critical region’ so that methods are run in sequential fashion (queued in a message box) so that they don’t step on member variables, thus making state transitions atomic. This isn’t enough.

Matsuoka and Yonezawa identify a few cases where inheritance can cause rather difficult problems.

  • Partitioning of Acceptable States
  • wherein what was considered a single state in the base-class is now divided into 2 or more states in the subclass. Their example being a sub-class of a buffered queue that introduces a method get2 that will return 2 items from the buffer. This partitions the base class states of {full, empty, partial} to {full, empty, twoOrMore, one}. The new states require re-implementation of the base methods get and put in order to track the availability of the get2 method.

  • History-only Sensitiveness of Accptable States
  • wherein the subclass introduces a new method gget that is dependant on the object’s history (it can only be called if the last method called was put). This requires the addition of an instance variable after-put that will keep track of the necessary state-history. Because of interactions with base-class methods, all methods will have to be re-implemented. (put must set after-put = True while all other method must set after-put = False)

  • Modification of Acceptable States
  • wherein the subclass introduces new methods lock, unlock that adds to the states already available in the base-class. So that once the object is locked none of the methods can be accessed until the unlock method is called. This would be most useful as a mixin class. It still requires the re-implementation of the base methods, even if guards on the methods are used.

The paper talks about some of the solutions that have been tried (guards on method calls, explicit state availablitiy (Rosette), Path Expressions, Explicit Method Reception, etc..) The authors rightly conclude that

…in order for OOCP languages to be usable for large-scale programming, the inheritance anomaly needs more thorough theoretical analysis, plus derivation of a good solution. We need to strive on the following:

  • Establishing a more precise and formal definition of classification of inheritance anomaly. …
  • Further identification of a general class of synchronization schemes with respect to anomaly classifications. …[Categorization and study of all the inheritance and state interactions]
  • Proper development of languages (features) that either totally avoid or minimize inheritance anomaly, based on the above two analysis.

It is my opinion that most, if not all, of these problems stem from what I think is the bane of concurrency models: mutable state. I really think that the object-oriented paradigm has incredible utility (esp. in GUI), and I’ve fully embraced thinking in OO terms. Now, as I descend more into the concurrency and functional world (through Erlang) I’m discovering exactly how much my mind depends on the imperative patterns. In my experience, banishing mutable state is going to be the only way to write concurrent programs that will scale naturally. Though I’d still like to bring along inheritance, poly-morphism, and mix-ins to the world of concurrency. The marriage off these two paradigms is going to be difficult to reconcile.

Without mutable member, OO programming will look very different. Each method will be force to return a new object that encapsulates the change in state. This does however, simultaneously help solve the Circle-Ellipse issue. But it just seems inefficient. We want to think of the object as having an internal, mutable state. Perhaps the compiler or interpreter can do some behind-the-scenes optimization.

Call-Trace capability in IDEs

Posted by Eric Hennigan
On November 3rd, 2007 at 00:11

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Comp*, Ideas, Language

I was thinking about the features that I’d want in a perfect language, and recalled something I was reading about compiler type-checking. I believe that the common type-checking algorithm can be adapted to do checking of interfaces. An IDE should be able to tell you not only the variables and names within the current scope, but also the attributes of those variables (via type inference). We can go farther and do this for functions. Each function can be associated with a return type, and the IDE via syntactical inspection of that function and back-tracking of calls and variables, should be able to tell you the exceptions that might be raised at any point in your code.

This way, you are free to let exceptions propagate to higher levels, and the IDE will keep you informed every step of the way. At the highest level, you should be able to get the IDE to say wether or not any exceptions have gone potentially uncaught or variables unsafely typed. You should also be able to obtain information about where those events occur in the form of a call trace.

For example, if we are editing code in main() then the IDE should be able to alert us that a ZeroDivisionException could propagate up to this level, and that it originated in a function 12 function calls deep (with the call-trace available for inspection) because we were careless when we wrote that subroutine for calculating average last month and forgot to catch the case when N=0.

This places a very heavy burden on the IDE, because it will have to check all possible return situations, and exceptional conditions that can occur at every point in the code. Fortunately, in the case of exception handling, we can think of functional call dependency as a graph. So, for every function A that depends on functions B and C, the exceptions that might get raised at A are any uncaught exceptions raised via calculations within A + exceptions raised in B + exceptions raised in C. Memoization and dynamic programming come to the rescue. Function return types can be similarly evaluated.

Aside: The information that can be gathered at author time might just convert me from my current fanship of dynamic typing languages. Principally, I despise statically typed languages because they are verbose in a terribly uninformative way. All too often I find myself wanting to do the same thing to stuff of different types (like have a container hold both Int and Real). The whole Design Patterns book is really a bunch of techniques to work around the static type system in C++. In an Ideal world each type would correspond to an encapsulated thought, and we’d all be doing Domain Specific Programming with our types. However, this marvelous idea just can’t be implemented with duck-typed languages, due to lack of information at compile-time.

Further Aside: Also, the programmer shouldn’t have to explicitly name the types, it’s the functional interfaces that really matter. Types can be inferred by the language itself. Each variable should be able to hold a range of possible types when the compiler does it’s checking. If the operation is allowed for each type then no errors, otherwise one of the variables had a type incompatible with that operation, and it’s type should automatically be call-traced back to the variables’ instantiation/declaration.

Note: I have no idea how this call-trace idea for program validity will work in the world of meta-programming. But for simple programs this is entirely do-able. If I, as a programmer, can think of almost all the situations each function works under, then the automatic code-inspector should be able to do the same in the time it takes for me, the slow human, to write that function. Partial results can be cached in a file of annotations (like python does with bytecode, and make does with object code).