eric the fruitbatBlog
Sounding out the Noosphere.

Posts from May, 2009

Computer Language Comparison

Posted by Eric Hennigan
On May 31st, 2009 at 17:05

Permalink | Trackback | Links In |

Leave a Comment |
Posted in Comp*, Language

Guillaume Marceau has used data from the Computer Language Benchmark Game to provide a graphical comparison of many different languages.

If you drew the benchmark results on an XY chart you could name the four corners. The fast but verbose languages would cluster at the top left. Let’s call them system languages. The elegantly concise but sluggish languages would cluster at the bottom right. Let’s call them script languages. On the top right you would find the obsolete languages. That is, languages which have since been outclassed by newer languages, unless they offer some quirky attraction that is not captured by the data here. And finally, in the bottom left corner you would find probably nothing, since this is the space of the ideal language, the one which is at the same time fast and short and a joy to use.

Of course the C compilers do a very good job on performance, but seem to do average on verbosity (better than I expected). Haskell (ghc) does a surprisingly nice job. I wish that I’d thought to do this kind of visualization, it’s really pretty neat. The only improvement that I could think of, would be to do the performance axis on a logarithmic scale.

Decentralized Garbage Collection

Posted by Eric Hennigan
On May 20th, 2009 at 00:05

Permalink | Trackback | Links In |

Comments (2) |
Posted in Comp*, Ideas

When I visited Washington D.C. for ASPLOS/VEE, I was working on my paper about cascade failure, and had distributed systems on my mind. Some of the papers at the joint conference were about garbage collection. One of the more interestring had an approach called leak pruning, for solving logical leaks. After awhile, I had an interesting idea that merged a bit of these two fields.

You see, distributed systems and garbage collection are both about graph problems. In the case of distributed systems, it’s usually about minimizing traffic along the edges, or performing some kind of global computation by communication with neighbors. In garbage collection, we’re concerned about reference cycles and reachability. But what if you ask: Is it possible for an object (node in graph) to decide, for itself, when it has been disconnected from a root node? In answering this question I came up with a very simple approach. I’ve been sitting on it for awhile, but, just yesterday, the paper was typed up in a hurry and submitted to the Fun Ideas and Thoughts session of PLDI.

The primary idea is this: If root nodes are tagged with a high value, and heap nodes are tagged with a low value, and each object acts as a node performing a distributed computation, then the average value of any subgraph that becomes disconnected from the roots will continuously decay. When it falls below a threshold value, that whole cluster can safely be collected.

There are some implementation details though. It’s going to be easiest to process these updates in an event queue in a separate garbage collection thread, rather than as a real distributed computation. When updating a single node, if we add to the queue all the points-to nodes, then the queue will rapidly grow without bound. But, we actually don’t care too much about how quickly the average worthiness is computed, so we can easily get away with adding only those nodes that are not already present in the queue.

The second problem is that references are directional. So, if we rely only the queue to hold references to objects, then it is possible for a set of nodes to become stranded (referenced neither by the queue, nor by the program) For example, the deleted branch of a subtree. We’d initially process the root of the branch, and proceed down to the leaves, but since it has no cycles we will loose all references to it, because the processed nodes will not be re-entered into the queue. (the program asked for it to be deleted, and the garbage collector does not maintain a list of all allocation sites) But this problem only happens for acyclic structures, so it can be solved by reference counting. When we go to update a node, it is safe to collect it if its reference count is zero, regardless of its worthiness value.

The third problem is that the average worthiness value of the program-reachable cluster of nodes will go down with the number of objects held by the program. This might limit the program size, but I think that the natural range of an unsigned double should be enough to solve this problem. Root nodes can have the value MAX_DOUBLE, and even if the program holds 10^6 objects, the average is still very high.

A fourth problem is the decay rate of unreachable clusters. This will be a function of both the computed worthiness at the time of disconnect, and the number of nodes in the cluster. I will probably have to adjust the formula given in my paper so that it has a faster decay rate. Unfortunately, this and the third problem above are in conflict with each other. Having high valued root nodes allows programs with more objects, but also extends the decay time.

I don’t claim this approach to be efficient, but I do claim that it’s simple to implement, and should scale nicely. Also, it might be useful for process collection in a distributed system. I’ll know if the idea was too crazy when review committee results are released on May 29.

An Atheist Goes to Church

Posted by Eric Hennigan
On May 17th, 2009 at 21:05

Permalink | Trackback | Links In |

Comments (4) |
Posted in Religion, Self

I, together with a group from AAR, went to Voyager’s Church today. I was reminded, in many ways, of why I don’t like Church.

The service opened with a song, All Because of Jesus by Casting Crowns (lyrics)
I was disturbed by the imagery in this song. It reminded me of why some accuse Christianity of being a Death Cult.

It’s all because the blood of Jesus Christ
That covers me and raised this dead man’s life

After that there was a little bit of talk about The Parable of the Workers in the Vineyard, followed by a prayer emphasizing the inherent equality and levelling that God’s forgiveness gives us.

Then we had more songs with disturbing Imagery.

  • Once Again by Matt Redman, which talks about looking at the Cross and being emotionally broken by the sacrifice it represents. But, encourages celebration of this torturous death with the refrain:

    Thank You for the cross
    Thank You for the cross
    Thank You for the cross, my Friend

    Sorry, but I just can’t. I’m on Hitchen’s side with this one. Human sacrifice is both tragic and unnecessary, and I won’t be contracted (against my will) via such a horrific act of violence.

  • The Wonderful Cross by Phillips, Craig & Dean. Again recalls the Imagery of the Cross, and fails to have uplifting lyrics.

    My richest gain I count but loss
    And pour contempt on all my pride

    Oh, the wonderful cross
    Bids me come and die

    Sorry, but I won’t. I have to side with Ayn Rand on this one, Human effort and productivity are to celebrated in their own right, not wantonly sacrificed. We should be happy and rejoice in our labor, not condemn it. Also, the cross is not wonderful, it’s tragic and really speaks a lot about our baser instincts as revealed in an underdeveloped culture.

After the singing, there was a silent prayer, followed by more singing.

  • Expanded version of The Wonderful Cross. With more sacrafical appeals:

    Love so amazing, so divine
    Demands my soul, my life, my all

    Again, I won’t be drawn into a contract involving blood sacrifice against my will. I’d rather have been there to help stop the crucifixion, That’s the moral thing to do.

  • None But Jesus really just attributes moral support to an imaginary character.

    In the moment of my weakness
    You give me grace to do Your will

    All my delight is in You Lord
    All of my hope, all of my strength
    All my delight is in You Lord Forevermore

    I resent the suggestion that I’d require a figure to worship in order to have these qualities. I much prefer the existentialist message that we are intrinsically capable of delight, strength, and positivism without needing to appeal to something outside ourselves. To suggest otherwise only implies that humans are undeserving worms.

  • From The Inside Out was the most disturbing song of the whole service.

    My purpose remains
    The art of losing myself
    In bringing You praise

    In my heart and my soul
    Lord I give You control
    Consume me from the inside out

    So, not only do we have imagery of North Korea, where everyone’s only duty is to praise and worship the lord, for all time! But, we also receive the delightful imagery of some wonderful parasite ‘consuming us from the inside out.’ These ideas really need to be put down.

  • Came to My Rescue simply contains images of prostration.

    Falling on my knees in worship
    Giving all I am to seek Your face
    Lord all I am is is Yours

    My whole life
    I place in Your hands
    God of mercy
    Humbled I bow down

This was followed by a prayer where we were all reminded that our whole lives should be devoted to worshiping and praising the lord. This is truly our purpose in life.

Finally, the pastor got to give us a sermon about the parable read earlier. He went through the development of the parable, nearly line by line. He began by reminding us that the grace shown by this landowner is like the Kingdom of God. Then he told us of the economic disparity that existed between the day laborer and the vineyard owner. Not only did the owner go into market himself to hire help, but he was also willing to discuss the wage dispute at the end of the day. (Apparently this is gracious behavior) The first group of laborers agrees to work for a denarius, while the second agrees to work for “whatever is right”. This automatically implies a scaled earnings contract. Several times he visits the market and hires help. This means that, by the end of the day, he’s hiring the dregs. But, like God, he’s willing to make multiple trips to save us all.

When he does pay their wages, he pays all hired hands ‘the wage’ (there’s only one). This justifiably upsets the workers that were hired first. And they are envious (not merely jealous), that the landowner has “made them equal to us”. He diffuses the situation by reminding them that they agreed to ‘whatever is right’ and ‘a denarius’ for the day’s labor, and also that it’s his money and he can dispense it however he pleases, that giving more to others does not take away from their earnings.

I think the landowner’s an idiot. He’s pays his help unjustly, yet this is called grace. He handles final payment poorly, he’s probably asking for dissent and resentment by paying first the people he hired latest, purposefully raising the expectations of those that were hired earliest. But this is to be a lesson to them, that the grace of God applies equally to all. Such grace does not follow the human intuitions of fairness. I’d rather have fairness, as the grace seems both arbitrary and capricious. He should have been forthright, that wages were based on having worked or not, rather than time and effort spent laboring.

The pastor also railed on and on about how satisfied the landowner must have been to be so generous. Like God, he must have just been wallowing in the gratification of having a beautiful and bountiful vineyard. So much so, that I gathered becomes both smug and economically irrational. But, again, it’s not about how long you spend worshiping, or how many good marks you gather by helping old ladies cross the street, it’s about surrendering yourself to his grace, that equalizes all.

This was followed by a prayer about how we should all be grateful that we have the opportunity, even at the eleventh hour, to serve in God’s fields, that we may be rewarded equally because the joy and (self)satisfaction God feels with his creation fills him with the desire to bestow grace upon us all.

Finally, a last song. You Never Let Go which emphasizes togetherness with the lord, even in troubled times.

So, I didn’t fully agree with the interpretation of the parable. I’d prefer fairness, and forthright agreement as to what is being offered, especially when the contract is different than the socially expected one. I was also disturbed by the imagery evoked in the songs. This is the first time that I’ve attended one of the more ‘vanilla concert’ church services since learning about the psychology of religious belief, and it was instructive.

I’m compelled now, more than ever, to admit that atheists might not be able to compete. We are simply, too independent, too free-thinking, too rational, to embrace such services every week. We aren’t ever going to have these soft, emotional songs about rationality and science. (well, the music was soft, the lyrics weren’t). Atheists simply don’t do the whole sycophant celebration thing. Because of this we have respect for each other, and can work together, but we just don’t have the same community, or accompanying sense of belonging and togetherness.

Maybe next time I’ll go to a service that has the wailing, raving, and speaking in tongues.

Automatic Thesaurus

Posted by Eric Hennigan
On May 7th, 2009 at 20:05

Permalink | Trackback | Links In |

Comments (4) |
Posted in Ideas, Language

Last week, I landed on another PhD worthy research project.

Given a very large corpus of sentences, such as a digitized version of the Library of Congress, or a less noisy version of the Internet, how can you automatically generate a Thesaurus?

At first I thought the problem should be fairly easy, but the more I thought about it the more difficult and daunting the task became. For example, as a first approach, we might assume that textual substitution would be a good proxy for identifying synonymous terms. That is if a couple terms really are synonymous, then they ought to be substitutable for each other in a sentence. With a large enough set of sentences, we should be able to identify such situations, and thereby bootstrap the building of the thesaurus. But there’s a small problem, provided by my good friend EvB:

The sky is blue.
The ocean is blue.

But sky is not the same as ocean. Sure they are similar. A poet could compose a nice metaphor of fish swimming through their sky above the bottom feeders. But this metaphorical relationship isn’t one that would necessarily make it into a human compile thesaurus. So, textual substitution can easily lead us astray.

Continuing with EvB’s particularly good example we can also identify another problem. Suppose that we incorporate a bit of natural language understanding, enough to pull out parts of speech. Then, the system would easily identify the equation of sky with blue, or ocean with blue. But neither of these statements is true either. Usually people take the example to mean not that sky and blue are the same thing, but that the sky belongs to the set of objects that have a property called color, the value of which is blue. So this understanding depends on what the definition of ‘is’ is (obviously not a simple affair). We also would like to avoid drawing a relationship between any of the pronouns and the rest of the language.

Next lets look at how people tend to write. Any good library is gonna be full of metaphor, simile, pun, allusion, word play, sound play, and other such highly nuanced expression. All of these things will trump any reasonably simple attempt at drawing a link between synonymous words. Political propaganda and polemic, will probably be particularly bad at equating terms that should probably be kept logically distinct. Furthermore, at least when I write, I’m reminded of other things during the process, things that are associated, but not necessarily synonymous. These remindings are an important part of the essay writing process, but will certainly throw noise into the digital library.

But if it’s so hard to make a mechanical system for identifying synonyms, then how do humans do it? Here I have a hypothesis: that similar words stimulate similar patterns in the brain. Thus when a human tries to think up synonyms it’s really the same as playing word association with a filter. First, the word stimulates the brain, bring up certain associations. These associations will be based on ‘brain distance’, a measure of the similarity of brain activity for certain words and thoughts. But some associations will be radically different from the synonyms that we’re looking for. For example, antonyms and non-sequiturs often come up in word association games. So a filter is applied to weed these out, and what’s left is passed through a dictionary/meaning check. Anything passing this process will be reported as a synonymous term.

So, in order to really generate a thesaurus, we do need AI (or at least an underlying cognitive model). When I first thought of the thesaurus problem, I was hoping that it was paired down enough, small and simple enough that it would be doable without all this complexity. We might have to reduce the problem further, make it looser. Say, build an association dictionary, rather than a thesaurus. An association dictionary might be possible, because it forgoes the understanding of meaning and similarity, it doesn’t have to question or measure why two words should be associated, only record that they are used similarly.

So, if you can automate the building of a thesaurus, you should get a PhD in Linguistics.

Philosophy of Computer Science: Naming

Posted by Eric Hennigan
On May 2nd, 2009 at 01:05

Permalink | Trackback | Links In |

Comments (1) |
Posted in Comp*, Language, Literature, Philosophy, Religion

For a very long time, western culture has had a strong undercurrent about naming. Conceptually, it starts with the recognition that the ability to name a thing gives you power over it. This is reflected in many deep and ancient cultural mythologies.

The creation story in the Bible begins with:

In the beginning,…the earth was a formless void…. Then God said: Let there be light. God called the light Day, and the darkness he called Night.

So God is able to create the Earth with only his Word, and give life to mankind with only his breath. This power is nearly transferred to Adam, when he is given the task of naming all the plants and animals. Only mankind is given this linguistic power.

Jewish mythology picks up on this issue with the story of the Golem.

In many tales the Golem is inscribed with magic or religious words that keep it animated. Writing one of the names of God on its forehead, a slip of paper in its mouth, or inscribed on its body, or writing the word Emet (אמת, “truth” in the Hebrew language) on its forehead are examples of such words. By erasing the first letter aleph in Emet to form Met (מת, “dead” in Hebrew, when the aleph letter א is cancelled) the golem could be deactivated.

Jewish culture continues this tradition with the Kabbalah’s search for the True Name of God. Other cultures also demonstrate this idea. In witchcraft, a demon is both summoned and controlled by speaking its name. In the Hindu tradition AUM is the sacred word that encompasses everything, and is the sole syllable upon which focus is kept during meditation. The idea is also reflected in more modern works, as clearly expressed in Ursula LeGuin’s A Wizard of Earthsea:

Ged sighed sometimes, but he did not complain. He saw that in this dusty and fathomless matter of learning the true name of each place, thing, and being, the power he wanted lay like a jewel at the bottom of a dry well. For magic consists in this, the true naming of a thing.

Or the so recently popular, Harry Potter, where Dumbledore advises Harry:

Call him Voldemort, Harry. Always use the proper name for things. Fear of a name increases fear of the thing itself.” (PS17)

But how does this relate to Computer Science? Being a very textual discipline, we have many conventions that relate to naming. In Computer Science, we have the ability to create virtual worlds, and thus we need systems of naming the objects within those worlds. At the Language level we see a focus on naming conventions:

  • Hungarian notation, in which variables have a prefix that describes their type, such as strName for a string, or pX for a pointer to X.
  • Fortran, which had an implicit typing scheme, where any names beginning with I, J, K, L, M, N were always integer and the rest were reals.
  • The Ruby on Rails framework, which has the ability to automatically map a model named “Person” to the “people” table in the database just by name inspection.

But naming actually turn out to be a much deeper issue than these linguistic examples show. In the Distributed Systems world, we have a large focus on naming, for a remote resource can only be accessed through its name, in what’s called name resolution. The easiest example to pick on here, is DNS, the system that allows a person to reference a remote computer by using an easy to remember domain (such as www.example.com) instead of a hard to remember physical address (such as 127.0.01). We can also identify a confluence of two separate concepts: The name of a machine can be used to locate it. This allows machines to operate with the previous cultural ‘power of naming’, knowing a machines name gives one access to that machine.

Since my research focuses on computer security, this duality between names and locations can be really critical. For example, there is a model for building secure software, called the object capabilities model, that not only identifies this power of naming, but actually explicitly states it as an axiom of the model:

  • Objects (actors) can interact only by sending messages to unforgeable addresses.
  • An object acquires knowledge of other objects in one of two ways:
    1. It is created with addresses that it receives from its creator
    2. It receives a message with an address to another object.

So, the security of the system is brought down to names. Communication and therefore power over other objects can only be obtained by learning their true names, which must be kept secret (unforgeable). For if a malicious object were able to easily guess the names of other objects in the system, it could quickly wreak havoc.

As such systems work their way into our daily lives, our personal names (read: personal identification) have also become much more important recently, as anyone that has been a victim of identity theft can attest. But this is an issue I won’t go into here. There are also other cultural impacts, for names change the way we think about each other.