eric the fruitbatBlog
Sounding out the Noosphere.

Posts from October, 2007

Unreliable Intuitions

Posted by Eric Hennigan
On October 30th, 2007 at 13:10

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Idiocracy, Language

Qt is a really wonderful and nice toolbox, and I thoroughly enjoy working with it, so it’s unfortunate that I’m using it for my example of one of the many things wrong with C++. In my opinion C++ is a kludge for bolting the Object Oriented paradigm onto a language barely more sophisticated than assembly. This unholy marriage has resulted in a language that is rife with special-case exceptions and horrifically complicated syntax. Just take a look at the STL and you can actually feel the hemorrhage of pre-mature optimization.

Anyway, Pointers are long known to be the most problematic concept for students of C/C++. Truly grokking pointers took me about 2 weeks initially, and I still get to enjoy learning another nuance about every other month. The ability to control the memory of your machine is a power not to be accepted lightly. We had an opportunity to leave pointers behind with Fortran and Lisp, but K&R apparently enjoyed living at the machine level. To this day we still get to hold pointers responsible for much of program ailments. (buffer overflow, memory leaks, null dereference, etc…)

One of the other real sticky issues that novices have to deal with is the difference between the stack and the heap. This is usually introduced alongside the introduction of pointers, because the two concepts are rather intricately related. But all this just makes the conceptual cliff a bit steeper. Eventually, the junior programmer develops a nice intuition about allocation and optimization, and when to use stack vs when to use head and handle the memory yourself. Then, the awesome power granted to us via pointers rears its ugly head.

Straight from the Qt Documentation on QString

Note for C Programmers

Due to C++’s type system and the fact that QString is implicitly shared, QStrings may be treated like ints or other basic types. For example:

     QString Widget::boolToString(bool b)
     {
         QString result;
         if (b)
             result = "True";
         else
             result = "False";
         return result;
     }

The result variable, is a normal variable allocated on the stack. When return is called, and because we’re returning by value, the copy constructor is called and a copy of the string is returned. No actual copying takes place thanks to the implicit sharing.

Well, isn’t this nice? Because of pointers we are permitted a really handy optimization. But, most of the time strings aren’t long enough to justify the sacreligious violation of hard-won intuitions. The real problem here is that nothing apparent in the syntax of this little snippet that lets the programmer know what’s really going on. It looks like everything is being copied from the stack to the return value. So we are obligated to read the documentation (which you should do for any toolkit, as they all do things like this) or possibly shoot ourselves in the foot.

Now this particular optimization is fairly well known, and used in many container classes. I’m not faulting Qt here, I put the blame squarely on the design of C++. This is a premature optimization, that messes with things behind-the-scenes, and it violates our intuitions. I would be willing to accept this type of behavior if there were some syntactical clue, but there isn’t. Programmers, esp. those writing toolkits, are really smart people, and I recognize their superior skills in crafting the incantations that permit the journeyman to have optimized code nearly by accident. But it’s still magic, and the language’s we use should be more explicit.

Note: This particular design has never caused me any troubles. I’m writing this from a linguistic design point of view. I don’t think it’s safe to give ourselves this type of power, because, ultimately, I know that this behind-the-scenes style black-magic has been abused and subsequently gotten many of us in serious trouble. Violating user expectations is always a sin.

Functional Programming with Erlang

Posted by Eric Hennigan
On October 28th, 2007 at 11:10

Permalink | Trackback | Links In |

Comments (3) |
Posted in Comp*, Language, Punditry

Last week I was reading Joe’s new book Programming Erlang and just couldn’t seem to get my mind around Functional Programming. You see when Joe says

Concurrency is easy.
We don’t have shared memory. I have my memory, you have yours, we have two brains, one each, they are not joined together. To change your memory I send you a message, I talk or wave my arms. You listen, you see, your memory changes, but without asking you a question or observing your response I do not know that you have received my messages.

bHe’s completely correct. So, why is Erlang a functional language then? I ask this Because; if you have memory then you have state, and your state is modifiable. Each process should have an associated heap, or at least mutable declared statics. When I send you a message you should be able to update your private store of knowledge.

I know that what I’m talking about can be achieved in a functional program, that to implement this you’d use a variable in the functional call, which is passed via tail-recursion, and that this isn’t too inconvenient because we use tail-recursion anyway just to receive the next message. But if you’re an agent with a large accumulated dictionary of knowledge, making a new copy of that dictionary to send to yourself is rather wasteful of memory. Since it’s your knowledge you should be able to modify it in-place. Just like what we do in the real world.

So while Joe has the right idea about modeling things from the real-world using concurrency, I feel that he’s got the implementation wrong. But it just might be that my growing up on imperative languages has broken my mind.

Linguistic Comparisons

Posted by Eric Hennigan
On October 21st, 2007 at 17:10

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Comp*, Language

Much of what is done is the way of linguistic comparison seems to be a bit subjective. The most favorite is to gather a bunch of idioms and then set them against each other drawing obvious conclusions, like language X is better than Y if you need a way of describing behavior Z. Though interesting in it’s own right (through this technique we can amass a large collection of idioms and probably use them as the building blocks for Domain Specific Languages), it doesn’t achieve as much as I’d like, mostly because of it’s isolated comparisons. If you compare languages in this way you rapidly enter an overwhelming set (scales an N^2) of idiom-vs-idiom comparisons as you increase the number of languages under your microscope.

So the best approach to linguistic comparison to date is probably the Great Language Shootout. They take the approach that you first get a set of problems that reasonably abstracts the entire set of real-world problems that coders have to deal with. So, we have a few small problems, each representative of some class: binary-trees, k-nucleotide, mandelbrot, etc.. Then we track the performance of each code sample, both in space (memory) and time. Because no single language is suited to all the different problems that can occur, we can find the best-of-breed by adding up the ranks of each language and then sorting. But this only measures computational efficiency, which has long ago gotten so cheap that ordinary mortals don’t waste their time on fancy algorithms [rule 3. under Complexity].

So, what we’d like to track is how efficient the program describes the solution to the problem, and how quickly that solution actually runs. That is we’d like a measure of how naturally a certain language describes the algorithm in addition to how well the language implementation translates that to hardware. So let’s take a lesson from algorithmic information theory, and simplistically take the length of the program (in lines) as a measure of its ability to compress the algorithm into text. (Yes this is really simplistic, a count of tokens going through the lexer would be more appropriate). Then we use a formula (something like):

\Delta_{ij}E = \frac{T_i S_i – T_j S_j}{K_i-K_j}
To get a measure of the Time*Space change with program complexity (measured with a length metric), or a measure of the change in efficiency when moving from one language to another. By doing this for the set of problems in the Language Shootout, we can get a feel for what benefits are gained and what abilities lost when translating an algorithm from one language to another. Hopefully this can all be plotted on some sort of manifold, after which we can backtrack to what idioms are most useful for which classes of problems. But this time, with quantitative data to back up the subjective comparisons that we’ve been using to date.

Note: another useful formula for linguistic efficiency might be E=TS/K where we measure the amount of data that can be manipulated per line.

Update: The formulas given really favor languages that have more stuff built-in, or more stuff in their standard lib. I do not thin this is unfair, programmers really enjoy, nay require abstractions in their toolbox. Nobody wants to go implementing yet another linked list every other week, We want that stuff built-in; it increases the language’s expressivity, and standardizes the approach to problems. Weighting in favor of these types of languages is a Good Thing™. Also you’d expect the code that you pull in in this way to be more efficient (general case) than what you’d write yourself. If not, then the formula still penalizes on the Time*Space usage.

Update: I’m wondering wether or not the set of problems chosen by the Language Shootout is a good representative of the class of all problems. Perhaps we should also include some of what might be considered non-language-specific design patterns, like State Machine simulations, Dynamic Programming, Constraint Solving, etc…

The Jet-powered Car

Posted by Eric Hennigan
On October 10th, 2007 at 17:10

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Ideas, Mech*

I had a really long discussion about car engines with my co-worker today. He has tons of information locked in his head about internal combustion design. We touched on valve and cylinder design, compression ratios, cam design. Discussed all the ways that race teams have tried to optimize fuel delivery, from alternative valve designs (rotary, spinning cylinder, spinning disk with holes, solenoid, etc..) so that standing waves in the spring don’t float the valve, and so that space isn’t wasted on the valve stem, and so that timing can be made more computerized (independent control for each cylinder) by removing the mechanical linkage to the cam shaft. It turns out that they shape the delivery tubes so that the compression/rarefactions that propagate through the inlet and outlet system are designed so that just when one delivery valve has to open up, a compression wave is standing right behind it pushing that fuel/air into the chamber, and just when an exhaust valve needs to open up a vacuum is being pulled by the exhaust from a separate cylinder. I think that all this design gets frighteningly complicated.

Finally I decided that the internal combustion engine just has way too many parts and too high tolerance, and requires too much active control to keep running optimally. So I think that vehicles should have an engine with continuous burn. This comes in two popular camps: the Jet Engine, the Steam Engine, and the Stirling Engine.

Now since a steam engine requires a working fluid and all sorts of other stuff (boiler, radiator if water is to be recycled, pistons or steam turbine, etc..) it quickly becomes just a complicated as the internal combustion. Plus, I think the fastest you could get the car warmed up and running would be about 30 seconds. (which you should be doing anyway on cold, winter days).

The Stirling Engine is good but it’s primary usefulness is for really low temperature differentials. And it would also require a trapped working fluid and some pistons. Too many working parts.

The Jet Engine has gotta be the way to go. They unfortunately like to spin at really high rpm (80,000) and so have tolerance and balance issues. But, if you could make a lower rpm version (10,000) then attach that to a good generator, then the car on could run solely on electric motors. This has all the advantages of electrical power distribution: extreme and perfect acceleration, drive by wire, rapid response to traction control, regenerative breaking, etc… Also, jet engines can be made really quiet by ducting the exhaust through a manifold. All it requires is that air move smoothly through the engine at a fairly good clip. The most complex part about the entire car would then be the fan assembly, and computers/sensors take care of everything else. Also, the engine could be run at optimal fuel efficiency all the time, and isn’t as sensitive to the type of fuel. As a bonus, the generator could probably be made to double as the starter-motor.

Relational Programming

Posted by Eric Hennigan
On October 5th, 2007 at 02:10

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Comp*, Ideas, Language, Tech*

Back in the dark-ages of computing we all had a hierarchical database with which to store our precious information. It was known as the file system. Then efficiency became vogue and organizational demands moved everyone to explicit database applications (client-server model, of course). Eventually, the long-festering and ubiquitous sore of inherently non-hierarchical data in combination with the prophets of higher learning revolutionized the way in which we think about our data. Thus was created: The Relational Database.

Meanwhile, over in Software Engineering land, we’ve moved from punched cards and mentally computing address offsets through the grand era of Structured Programming and have finally landed, and fully-embraced, the Object Oriented model of software architecture. Most of the intellectual effort expended within this paradigm is on building object hierarchies, and worrying about maintaining encapsulation, and getting the right and proper amount of inheritance without violating the Liskov substitution principle.

Does this seem backwards to anybody? The Database world has progressed beyond the hierarchical model because of a realization that real-world data does not conform to a hierarchy, while all the software developers continue to struggle to model the world with inheritance hierarchies. And the more we force this model in our software design the more perverse the problems become, and pretty soon “friends have access to each other’s private members.”

We need to break the mold and democratize our objects! They need to be individualized, not just anonymous members of a large hierarchy. I believe that maintaining the strict inheritable and environmental constraints is holding the industry back. The time has come for Independence! Our objects should be promoted to full fledged processes. This is the philosophy of Erlang. Too much memory and processor time is wasted on the construction of even a Thread at the OS level, so concurrency must be implemented by the language itself. It should be trivially easy to create and destroy processes in the thousands, calculating en-mass at the whim of the programmer. To achieve this lofty goal, each process must be independent from others, no inheritance of environment, and no shared memory. Of course all this works much more naturally in a functional language, which already gets the programmer thinking about functions and first-class citizens. Now they become honored as processes.

The Language of the Future

Posted by Eric Hennigan
On October 1st, 2007 at 09:10

Permalink | Trackback | Links In |

Comments (1) |
Posted in Comp*, Ideas, Language

Most of the popular languages today (Perl/Python/Php/Ruby, C++, Java) are all locked into the serial, von-Neumann model of computation. Concurrency is expressed as Threads and Processes, which rely on the underlying OS’s handling of memory and other resources. Joe Armstrong has demonstrated that this just isn’t good enough; the programmer needs to have truly independent processes together with the ability to spawn them with nearly zero cost. Because we’ve hit a point of diminishing returns trying to optimize single CPU’s the industry has begun the move toward multi-core machines. This only benefits programs that are written with concurrency in mind. So the language of the future will have extremely lightweight concurrency built-in (just like Erlang).

But what new problems will it solve? One of the best, and often overlooked, problem domains is that of constraint solving. These problems show up everywhere, especially scheduling and routing. Business would love to have a language specifically geared toward solving problems of resource allocation. Prolog already does this very well, but never caught on. I think that many constraint problems can be solved even more rapidly by exploiting extreme concurrency. So the language of the future will popularize an old class of problems: the constraint problem; but it will do so with a parallel computing model, and force us to think about our problems in a different way.

I would like the new language to have a clean syntax, like Python. It should be easy to read, and easy for the programmer to manipulate. It should focus on clean syntax primarily because it’s hard to teach an old dog new tricks; most programmers were brought up on the serial model of computation, so they’ll have a bunch of new ideas to swallow, syntax shouldn’t throw them off. It’ll also have to be competitive at solving most of the existing problems: it’ll need to have lots of built-in convenience functions for string manipulation, have a well-organized GUI library, come bundled with library’s for networking, html, parsing, regex, numerical analysis, etc, etc…

I still don’t yet know wether it’ll be imperative or functional, but it’ll probably wind up being functional, which would slow adoption rates.

Update: LtU discussion and related paper