Language – eric the fruitbat

Flexible Iterators

erich — Thu, 31 Jan 2013 09:18:58 +0000

Java has some odd quirks which make it far more inflexible than it needs to be. For example, many programs have data structures which need to be iterated both forwards and backwards, and some algorithms require treating the first or last element differently than the others. My goal here is to find a tweak to the Java language that would permit use of the foreach loop in all of these cases. To achieve that, it would be nice if the data structure in question could return different iterators appropriate to the task at hand.

Let’s first review the laborious, multi-step process of gifting a class to support the foreach syntactic sugar.

Declare the class with so that it implements Iterable.
Define a method Iterator iterator().
Implement an appropriate class DataIterator extends Iterator
Define methods supported by all Iterators: next(), hasNext(), and remove().

The foreach loop, which looks like:

for(Data d : collection) {
  // do something with d
}

desugars into

Iterator it = collection.iterator();
while (it.hasNext()) {
  Data d = it.next();
  // do something with d
}

Knowing this implementation, I can achieve my stated desire in one of two ways 1. a syntax change or 2. a language change. I shall first present the syntax change because it’s such a horrible idea.

Syntax Modification

We first observe that the foreach loop desugars into a call to iterator() passing no arguments. This constraint forces each class into a box where it can only implement one kind of iteration. Immediately, my object oriented (damaged) brain thought to alleviate this constraint by overloading of the iterator() method with versions accepting different arguments. For example, a DataIterator which allowed slicing might be called with three arguments: start, end, and stride. The foreach loop syntax can be extended to perform this lookup, by a small tweak to the desugaring:

for(Data d : collection)(start, end, stride) {
  // do something with d
}

The syntactic change can even be made backwards compatible with the use of varargs.

Semantic Modification

Change the semantics of the foreach desugarer so that it can handle both Iteratables and Iterators.
This is by far the easiest change, because I find it comparatively easy to gift my class with many well-named methods which each return an Iterator.
The same example reads much better now:

for (Data d : collection.slice(start, end, stride)) {
  // do something with d
}

Instead of slice I could also implement reverse or any other kind of iteration.

I wondered why the Java implementors did not already put in the ability to handle both Iteratables and Iterators.
An answer on StackOverflow, of course, held the standard objection:

The reason for-each loops require an iterable is to allow the same object to be traversed multiple times (so that you can use multiple for-each loops over the same object without surprising behaviour), whereas an iterator only allows one traversal. If for-each allowed iterators to be used, the behaviour would be surprising to programmers who didn’t realise that their iterators would be exhausted after the loop is run.

Closing Remarks

My postdoc Per pointed out that Python programmers obviously have an easier time. Because Python has dynamic lookup mechanisms that allow the for-in loop to accept any iterator, it supports much more composition, allowing filters, generators, and iterators.

Stack Storage and Garbage Collection

erich — Wed, 12 Sep 2012 02:50:16 +0000

Recently, I ran across Mike Vanier’s page containing his opinions on Scalable Computer Programming Languages. I agree almost entirely with his list:

garbage collection

no pointers or pointer arithmetic

a foreign function interface to the C language

static type checking with type inference

support for exception handling

run-time error checking for errors that can’t be caught at compile time, like array bounds violations and division by zero

support for assertions and design by contract

a powerful, statically checked module system

support for object-oriented programming

support for functional programming

structural macros

support for components

a simple, consistent and readable syntaxn

But I do want to pick on garbage collection.

A couple weeks back I was implementing the Parser for the compilers class I’ll be teaching in fall. I want the Parser to print out which grammar rules it called during recursive descent, so that I can use that output as a check on student implementations.

Because the implementation is in Java, I resorted to calling two functions in each grammar method: enterRule(String rule) and exitRule(String rule). The call to exitRule is necessary for tracking indentation level in the output.

What I really wanted was a decorator, that would automatically intercept calls to the grammar methods inserting an enterRule and exitRule. I attempted to accomplish this via reflection, but gave up when I discovered that the compiler optimizes the self-dispatch to recursive methods (when one grammar rule calls another).

Had this been C++ I would have implemented a scope guard object. It’s constructor would make a call to enterRule, and the destructor would call exitRule. I don’t trust this pattern in Java though, because it doesn’t provide a distinction between heap-allocated and stack-allocated objects. The call to exitRule, must be made at the time the grammar rule exits, which happens to coincide with the time a function-scope stack-allocated object destructs.

So I would add some detail to the above list: Either I need compile-time, type-checked decorators, or I need the language to distinguish between allocations on the stack vs heap. I’m fine if the garbage collector non-deterministically destructs heap-allocated objects, but I want stack-allocated objects to destruct at the time their scope pops.

Even better: maybe the ideal language would provide some statically verified mechanism for aspect-oriented crosscuts.

Booze Poetry: Three Philosophers

erich — Mon, 27 Aug 2012 00:13:12 +0000

Last night I drank a Three Philosophers Belgian Style Blend (Quadrupel), and decided to write some more descriptive verse:

Tonight, I drank up the philosophy of my three Beligan friends. Throughout the night we opined strongly about the mysticism of our bubbling climate. The arguments flowed with intense flavor, penetrating deep into my consciousness. The quadrupel tension of argument resolved itself only at the break of dawn and wearing of mind.

Definitely not as good as the my previous scotch description.

Booze poetry

erich — Thu, 21 Jun 2012 19:47:44 +0000

Last night, Ben and I conversed about mixing drinks. He pointed me to a drink containing Chartreuse VEP by the Cocktail Whisperer. Inspired by the opulent verbiage, I wrote the following:

Not satisfied with an insubstantial vodka, I prowl around the liquor cabinet. Deep in back, under cover of dust, I find a dark and mysterious spirit. When opened, the bottle emits a foggy vapor reminiscent of the peat bogs of Scotland. When drunk, that same vapor clouds the mind in a layer of thoughtful mist which doesn’t clear ’till next morning’s sun.

Which must surely be the most romanticized description of a hangover there ever was.

Cognition and Linguistics

erich — Fri, 25 May 2012 19:47:29 +0000

I see the study and development of computer languages as two sides of the same coin. A computer language should enable the programmer to express, clearly and concisely, an algorithmic intent. It should not burden the programmer with a particular model of computation, ex.

Cognition of Linguistics
————————–

In order that we express to computers what we really mean, we should carefully study the relationship between algorithms and the language used to express them. We should abstract out the ideas which lead to certain forms of expression over other forms. We should also study how the familiarity of a single computational model restricts the deveolpment of higher-order abstractions.

Linguistics of Cognition
————————–

Insofar as reasoning is the manipulation of symbols, we must seek to understand the influence that a particular language has on the relative ease of certain abstractions.

Embedded Languages

erich — Sun, 29 Apr 2012 02:12:47 +0000

I don’t like them.

I’ve ranted before about how the Web is a festering polyglot made horrific by Postel’s Law. Many, including Tim Bray, advocate more knowledge at the client end, when an error occurs in parsing the steaming pile of HTML that forms today’s Web pages. I almost fell in line with this reasoning, because more information is better, right? I thought a draconian policy would so irritate customers that businesses would be quick to fix it, and expend much effort on prevention. So, all the Web becomes well-formed.

Oh how wrong I was!
Jeff Atwood recounts an interesting tale at http://diveintomark.org/archives/2004/01/14/thought_experiment:

Imagine that you posted a long rant about how this is the way the world should work, that clients should be the gatekeepers of wellformedness, and strictly reject any invalid XML that comes their way. You click â€˜Publishâ€, you double-check that your page validates, and you merrily close your laptop and get on with your life.

A few hours later, you start getting email from your readers that your site is broken. Some of them are nice enough to include a URL, others simply scream at you incoherently and tell you that you suck. (This part of the thought experiment should not be terribly difficult to imagine either, for anyone who has ever dealt with end-user bug reports.) You test the page, and lo and behold, they are correct: the page that you so happily and validly authored is now not well-formed, and it not showing up at all in any browser. You try validating the page with a third-party validator service, only to discover that it gives you an error message youâ€ve never seen before and that you donâ€t understand.

You pore through the raw source code of the page and find what you think is the problem, but itâ€s not in your content. In fact, itâ€s in an auto-generated part of the page that you have no control over. What happened was, someone linked to you, and when they linked to you they sent a trackback with some illegal characters (illegal for you, not for them, since they declare a different character set than you do). But your publishing tool had a bug, and it automatically inserted their illegal characters into your carefully and validly authored page, and now all hell has broken loose.

You desperately jump to your administration page to delete the offending trackback, but oh no! The administration page itself tries to display the trackbacks youâ€ve received, and you get an XML processing error. The same bug that was preventing your readers from reading your published page is now preventing you from fixing it! Youâ€re caught in a catch-22. … All the while, your page is completely inaccessible and visibly broken, and readers are emailing you telling you this over and over again.

…

Hereâ€s the thing: that wasnâ€t a thought experiment; it all really happened. Itâ€s a funny story, actually, because it happened to Nick Bradbury, on the very page where he was explaining why it was so important for clients to reject non-wellformed XML. His original post was valid XHTML, and his surrounding page was valid XHTML, but a trackback came in with a character that wasnâ€t in his character set, and Typepad didnâ€t catch it, and suddenly his page became non-wellformed XML.

The moral of the story is actually not about well-formedness and draconian client validation, but one of security. It should not be possible for somebody else to break your system. The mechanism by which we include foreign content into our pages in fundamentally broken. HTML systems usually function as templated string processing, a practice which results in the above problems. It’s an issue of content injection and a lack of sandboxing, that’s only masquerading itself as one of well-formedness and validation. Embedded languages shall never escape this quagmire.

Measuring Effectiveness of a Domain Specific Language

erich — Thu, 05 Apr 2012 22:43:50 +0000

Also, at CGO I met Hassan Chafi, who is working on a graph-based Domain Specific Language. Even though I never seem to find time that I can explicitly devote to studying them, DSL’s are, to me, an compulsively fascinating topic. A day or so after the discussion it occurred to me that we need some metrics by which a DSL can be measured. Now, in the general purpose language field Wirth has come up with what is, in my opinion, a very elegant metric: language complexity can be measured by the size of the self-hosting compiler. That works great for general purpose languages that have to do string processing, parsing, data structures, traversals, modeling, etc. Each of which is a component of the self-hosting compiler. But it works less well for a DSL, because the focus on particular domain means they aren’t general purpose.

In the case of a DSL for graphs though, I think the case is clear: It should run graph algorithms well. But which ones? And how do you measure expressibility? It took a couple of days for the answer to arrive in my head. I had at one point encountered a wonderful paper on On variants of shortest-path betweenness centrality and their generic computation by Ulrik Brandes. This paper provides a dozen related graph algorithms. It is presented in a way that emphasizes the changes between the base centrality algorithm and each variant. This style of presentation helps to measure how well the DSL allows similar algorithmic changes.

So I think it’s a good start to answer the general question, “How to measure the effectiveness of a DSL?”, with a case study. Make a list representative of what you wish to do, and try it out, looking for patterns and variations on a theme.

Experience, CGO 2012

erich — Thu, 05 Apr 2012 05:26:44 +0000

I attended CGO 2012. The speakers were universally boring, but the conversations that you have with other attendees can be quite interesting. For example, I have been thinking that the hodge-podge babel of languages that makes up web applications should be replaced with something more lispy. William Maddox, currently at Adobe, shares the same opinion and was able to introduce me to two really interesting projects.

The first, Meta-HTML, comes from the inventor of Bash, Brian Fox. It runs server-side and implements a full language within the html tags themselves. It uses the CGI abstraction to run mostly independant of the web server. Installation is as simple as placing the interpreter in your PATH (thanks to unix #!). What’s fascinating are the language ideas: The angle brakets become what parens are to Lisp, the tag names become function names, spaces separate positional arguments, and attributes (which use the attr = "value" syntax) become keyword arguments.

The second, curl, comes from MIT in 1998 and is now hosted (and under active development) by a Japanese company, SCSK. It’s a complete document description language and includes dynamic content. Formatting directives (and many other cool stuff) is implemented via libraries and built-in functions. The calls are just like lisp but use curly braces rather than parens. Not only was this what I had originally envisioned, it goes much farther! You can type variables as dynamic in which case they will automatically update content whenever that variable is updated. (Dataflow, just like cells in a spreadsheet). You can also code up your own threads! So you can code a watching circuit that will update content, say every 5 seconds. They have an development environment (unfortunately Windows 7 only).

Express yourself: to the compiler and to your fellow developer.

erich — Thu, 05 Apr 2012 05:23:19 +0000

The keynote speaker at CGO 2012 (Chris Lattner, LLVM) put some crazy thoughts into my head.

Want compiler to know about:

memory disjointness
aliasing
Usage of data structures (array of struct vs struct of arrays)
whether arithmetic is done on a pointer (and the bounds)
invariants (in loops and between methods)

A language needs to be able to express some of these concerns. Not just because the analysis within the compiler benefits from having the data, but that programmers themselves should be documenting these properties. A great programmer knows about the analysis the compiler can perform. A great programmer knows about the assumptions that such analysis requires. And a great language supports the great programmer by allowing her to express these properties within the code itself.

Creating a language that supports these higher-level descriptions allows other programmers to see why a certain portion of code is structured the way that it is. It helps them from innocently re-structuring the code so that the compiler’s analysis fails (and performance is lost). It makes more clear what you shouldn’t say as a programmer.

I’m not familiar with Eiffel, so I may be completely out of place with this example. But, it seems to me that Eiffel’s choice to allow the programmer to express explicitly, in source the invariants of their programs has two distinct benefits: (1) the compiler has more information to work with during program analysis, (2) the programmers are encouraged to think more deeply about their code’s structure. Much compiler research has tried to investigate “How much can we do this automatically (so that we don’t have to change existing code and so that programmers don’t have to learn anything new)?” I’m using Eiffel as an example of why these objectives are actually harmful. In my experience as a programmer, knowing more has always helped. I desire a language that encourages me to know more, and the best encouragement is to have linguistic support for describing higher-level properties. In Eiffel’s case it’s program invariants, but why not include also some of those things in the list above?

The compiler shouldn’t be a magic black box! It should be a tool that yields increasing benefit the more a programmer devotes to learning how to use it. The benefits should build on each other incrementally, so that even though complete mastery takes 10 years, each incremental step is worth taking.

Editor Wars

erich — Thu, 08 Mar 2012 07:46:49 +0000

Normally, I use vi for most of my editing work. But, I’ve been hearing much about emacs and its ability to do a better job at syntax highlighting, code completion, spell checking, even a writegood-mode for detecting passive voice. Emacs might be a pretty good editing OS, but it doesn’t come with an editing language. Here’s what I mean:

The beginning of understanding the Zen of Vi comes when you realize that you are not memorizing key-bindings, but rather, you are learning a language.

What a beautiful concept! You are not memorizing that “d$” means “delete from here to the end of the line”. You are instead learning how to tell Vi that you want to delete (“d”) from here to the end of the line (“$”). The operation command, “d”, can be seen as a verb, while the target of the operation, the end of the line, “$”, can be seen as the object.

— Grokking the Zen of the Vi Wu-Wei

Because these are “movements” they can also be used as subjects for other “statements.”

Searching forwards or backwards are movements in vi. Thus they can also be used as “subjects” in our “statements.”

In addition to “verbs” and “subjects” vi also has “objects” (in the grammatical sense of the term).

This notion of “prefixes” also adds the analogs of grammatical “adjectives” and “adverbs’ to our text manipulation “language.” Most commands (verbs) and movement (verbs or objects, depending on context) can also take numeric prefixes.

— What is your most productive shortcut with Vim? (StackOverflow)

I can’t help but think, that for all the verbiage spilled about how Lisp encourages solving your problem in a domain specific language you create just for that purpose[Clementson’s Blog, The Lisp Difference, John Foderaro, Lisp is a Chameleon, Paul Graham, Beating The Averages] How did this concept of an editing language elude the author of Emacs?

So I read what RMS had to say about Emacs’s beginnings. It didn’t start out with Lisp, but rather was an interpreter written for TECO, which itself did have an editing language. But RMS describes it as “extremely ugly …, as ugly as could possibly be.” After adding several extension to TECO RMS observes “the language that you build your extensions on shouldn’t be thought of as a programming language in afterthought; it should be designed as a programming language. In fact, we discovered that the best programming language for that purpose was Lisp.”

I’m still flummoxed. Why, after seeing that TECO was an editing language, were the Emacs commands not created likewise? Why is delete-char a separate keystroke, rather than a command followed by a command? Why are Emacs movement commands flung over the keyboard according to name, rather than position? If you find you need Lisp to design an extension language, why didn’t you also see that editing itself should be a language? Why did this not escape Bill Joy, who was implementing vi in C, so that it could be used over a 300 baud modem?