<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>eric the fruitbat &#187; Language</title>
	<atom:link href="http://www.cogitolingua.net/blog/category/language/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cogitolingua.net/blog</link>
	<description>Sounding out the Noosphere.</description>
	<lastBuildDate>Fri, 03 Feb 2012 23:40:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Strong Typing for Security</title>
		<link>http://www.cogitolingua.net/blog/2011/11/11/strong-typing-for-security/</link>
		<comments>http://www.cogitolingua.net/blog/2011/11/11/strong-typing-for-security/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 08:13:11 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Comp*]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Information Flow]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=831</guid>
		<description><![CDATA[<p>I got into a mild argument about static vs. dynamic typing. I recognize that static typing can be verbose to the point of being repetitious. Take Java generics for example:</p> List&#60;String&#62; astr = new ArrayList&#60;String&#62;&#40;&#41;; <p>There really isn&#8217;t a great reason why the compiler can&#8217;t infer the type of the variable on the right hand [...]]]></description>
			<content:encoded><![CDATA[<p>I got into a mild argument about static vs. dynamic typing. I recognize that static typing can be verbose to the point of being repetitious. Take Java generics for example:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">List<span style="color: #339933;">&lt;</span>String<span style="color: #339933;">&gt;</span> astr <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> ArrayList<span style="color: #339933;">&lt;</span>String<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>There really isn&#8217;t a great reason why the compiler can&#8217;t infer the type of the variable on the right hand side of the assignment. C# already implements type inference for this case, and C++ is <a href="http://en.wikipedia.org/wiki/C%2B%2B11#Type_inference">adding it</a>. ML and Haskell are strongly typed and have practiced type-inference since their inception. So we should actually dismiss the verbosity objection to static typing right now, because it&#8217;s an artifact of implementation that the more popular languages, C++ and Java, represent really poor examples of what could otherwise be a really good thing. </p>
<p>In my opinion, a static typing system is actually a proof system over your code. We shouldn&#8217;t complain about having compiler errors, rather we should rejoice that the compiler is able to automatically detect cases where we were ambiguous or tried to do something ill-defined. We should try to write our code so that the compiler can tell us when we make a mistake. Really, we want to express as many constraints as possible, so that the machine can do more checking and we end up with less buggy code. Statically typing all our variables and expressing systematic constraints is an effort that pays off in spades for large code bases.</p>
<p>But couldn&#8217;t we all just use the more flexible dynamic typing languages, and catch the bugs with testing? In my opinion, no. Testing should be done anyway, but it isn&#8217;t enough to prove the absence of a bug. Only a proof checker, such as a static typing system, can come close to doing that. I think I can really drive this point home by examining web applications.</p>
<h4>The Problem</h4>
<p>Web applications are really glorified string processors. HTML requests come in as strings, and web pages are emitted as strings. JavaScript processes more strings in the page layout, potentially requesting even more information from the server in response to user-generated events. Forums, Social Networking, and other participatory applications allow for user generated content. This widespread and popular practice actually leaves our glorified string parser (web app) at risk: for, if we are not careful, a malicious user can supply a string which, if it appears in the &#8216;wrong&#8217; context, might be interpreted as legitimate JavaScript code by the application. That is, malicious users can execute arbitrary code, with the full rights and privileges as the application itself. This vulnerability is known as <a href="">Cross-Site Scripting (XSS)</a>.</p>
<p>So, we find ourselves writing a string processor which must deal with strings of various encodings, special characters, and escape conventions. Namely, HTML, JavaScript, XML, CSS, URL. If one of these strings (even from our own database) manages to arrive in a context without first going through a filter to sanitize it, then our application has a security vulnerability. Do you think that it&#8217;s possible to write test cases (or even auto-generate them) given all the code paths, all the different sources (user, cookie, url, database, etc) and all the contexts in which a string might appear. In my opinion, the exponential complexity makes testing an infeasible approach. What we really need, then, is a proof system to verify that no strings end up in the wrong context.</p>
<h4>Static Typing to the Rescue</h4>
<p>If we are willing to go back to our application and examine it in detail, we find that we should really be treating each of the above strings as different types. HtmlString should be a different type from JSString, which are again both different from UrlString. Simply expressing each context as a different type enables our static typing system to verify that we never use the wrong kind of string in the wrong context. We can also provide explicit conversion functions, which provide the proper escaping and sanitization when moving from one context to another.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> addToDocument<span style="color: #008000;">&#40;</span>HtmlString hStr<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
HtmlString fromURL<span style="color: #008000;">&#40;</span>UrlString uStr<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
UnsafeString HttpRequest<span style="color: #008000;">&#40;</span>UrlString uStr<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<h4>Language Support</h4>
<p>What&#8217;s most unfortunate about this approach is that neither C++ nor Java provide us with an easy way to distinguish two strings. We certainly don&#8217;t want to use C&#8217;s <code>typedef</code>, because that enables automatic coercion between the different kinds of string, which defeats the point. So, we&#8217;re forced into creating a separate class for each of these strings, including implementing all the operators that make for convenient string manipulation. I&#8217;d really love a language that would allow me to extend my existing string type without fully re-implementing everything, yet still be able to treat the extension as a completely different type.</p>
<h4>Conclusion</h4>
<p>Essentially we&#8217;re using the static typing as a proof system to constrain our programming practices. The static type verification provides a proof that we never use a string in the wrong context. In my opinion, this coding technique is of enormous benefit, and represents a use-case that dynamic typing + unit testing simply cannot approach.</p>
<p>The real trick is recognizing that two strings aren&#8217;t necessarily the same type.</p>
<p>Just for reference, I did not come up with this example myself.<br />
Joel Spolsky <a href="http://www.joelonsoftware.com/articles/Wrong.html">advocates</a> using Hungarian notation, which I think is too weak for solving security vulnerabilities.<br />
Tom Moertel provides an <a href="http://blog.moertel.com/articles/2006/10/18/a-type-based-solution-to-the-strings-problem">inplementation</a> of this approach in Haskell.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2011/11/11/strong-typing-for-security/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Publishing should be Skribilo</title>
		<link>http://www.cogitolingua.net/blog/2011/08/18/the-future-of-publishing-should-be-skribilo/</link>
		<comments>http://www.cogitolingua.net/blog/2011/08/18/the-future-of-publishing-should-be-skribilo/#comments</comments>
		<pubDate>Thu, 18 Aug 2011 23:28:29 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Punditry]]></category>
		<category><![CDATA[Tech*]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=699</guid>
		<description><![CDATA[<p>Finally, I found something that looks like it could suitably replace LaTeX! It&#8217;s called Skribilo and features all of the goodness observed in a previous post about using a Lisp-like syntax instead of that crufty HTML/XML nonsense.</p> ]]></description>
			<content:encoded><![CDATA[<p>Finally, I found something that looks like it could suitably replace LaTeX! It&#8217;s called <a href="http://www.nongnu.org/skribilo/">Skribilo</a> and features all of the goodness observed in a <a href="http://www.cogitolingua.net/blog//2010/12/21/the-future-of-the-web-should-be-lisp/">previous post</a> about using a Lisp-like syntax instead of that crufty HTML/XML nonsense.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2011/08/18/the-future-of-publishing-should-be-skribilo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A language should be focused on writing Internal DSLs</title>
		<link>http://www.cogitolingua.net/blog/2011/05/18/a-language-should-be-focused-on-writing-internal-dsls/</link>
		<comments>http://www.cogitolingua.net/blog/2011/05/18/a-language-should-be-focused-on-writing-internal-dsls/#comments</comments>
		<pubDate>Thu, 19 May 2011 06:21:27 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[Comp*]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=527</guid>
		<description><![CDATA[<p>I&#8217;ve been reading Martin Fowler&#8217;s book, Domain-Specific Languages, this weekend. He covered a number of ways in which you can structure your code to achieve what he terms an Internal DSL. Quite a bit is focused on the discussion of a fluent interface. It turns out in many languages there are only so many ways [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading Martin Fowler&#8217;s book, <em>Domain-Specific Languages</em>, this weekend. He covered a number of ways in which you can structure your code to achieve what he terms an <em>Internal DSL</em>. Quite a bit is focused on the discussion of a <a href="http://en.wikipedia.org/wiki/Fluent_interface">fluent interface</a>. It turns out in many languages there are only so many ways to structure the code so it looks like a DSL rather than the host language. Chapter 4 of the book explores a number of these techniques, and summarizes the results in a nice table, mapping language structure (for a DSL) to host language patterns:</p>
<table width="100%">
<tr>
<th>Structure</th>
<th>BNF</th>
<th>Consider&#8230;</th>
</tr>
<tr>
<td>Mandatory list</td>
<td><code>parent ::= first second third</code></td>
<td>Nested Function</td>
</tr>
<tr>
<td>Optional list</td>
<td><code>parent ::= first maybeSecond? maybeThird?</code></td>
<td>Method Chaining, Literal Map</td>
</tr>
<tr>
<td>Homogenous bag</td>
<td><code>parent ::= child*</code></td>
<td>Literal List, Function Sequence</td>
</tr>
<tr>
<td>Heterogenous bag</td>
<td><code>parent ::= (this | that | theOther)*</code></td>
<td>Method Chaining</td>
</tr>
<tr>
<td>Set</td>
<td><code>n/a</code></td>
<td>Literal Map</td>
</tr>
</table>
<p>What&#8217;s interesting about this mapping is that, we can use it to engineer a language whose sole purpose would be to solve problems by authoring and combining DSL&#8217;s (what the Lisper&#8217;s are always claiming they do through the use of macro&#8217;s).<br />
The focus here is to get a function call specification that can do a good job of representing all the control structures that are present within a grammar.<br />
So let&#8217;s take a careful look at <a href="http://en.wikipedia.org/wiki/Wirth_syntax_notation">Wirth&#8217;s syntax notation</a> to get a good idea of what needs to be modeled.</p>
<ul>
<li><b>Sequencing</b> is conveyed by a space-delimited list of grammar elements: <b>a b c d</b></li>
</li>
<li><b>Repitition</b> is denoted by curly brackets: <b>{a}</b> stands for <b>&epsilon; | a | aa | aaa | &#8230;</b></li>
<li><b>Optionality</b> is expressed by square brackets: <b>[a]b</b> stands for <b>ab | b</b></li>
<li><b>Grouping</b> is indicated by parenthesis: <b>(a|b)c</b> stands for <b>ac | ab</b></li>
</ul>
<p>Fowler gives some interesting suggestions, both in the above table and in the surrounding discussion within the book. I&#8217;d like to take some space here and explore each of the options, and how it relates to a Wirth-style grammar. Throughout this exposition, we&#8217;ll be using programming language constructs such as function calls and scoping to model the grammar rules typically seen in the above table. We&#8217;ll also assume that each grammar rule can be made to correspond to either a method or type (or both) in the implementors programming language.</p>
<ul>
<li><b>Mandatory list.</b> Within a grammar, a mandatory list is indistinguishable from a sequence. Within a programming language, the mandatory list can also be expressed as a list of arguments that must be provided to a function. We can even invoke compile-time checking by having each grammar rule correspond to a type. Fowler doesn&#8217;t quite go that far, but he does give this illustrative example:<br />
    <center></p>
<table width="100%">
<tr>
<th>Nested Function</th>
<th>Example declarations</th>
<tr>
<td>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">computer<span style="color: #008000;">&#40;</span>
    processor<span style="color: #008000;">&#40;</span>
        cores<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span>,
        speed<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2500</span><span style="color: #008000;">&#41;</span>,
        i386
    <span style="color: #008000;">&#41;</span>,
    disk<span style="color: #008000;">&#40;</span>
        size<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">150</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#41;</span>,
    disk<span style="color: #008000;">&#40;</span>
        size<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">75</span><span style="color: #008000;">&#41;</span>,
        speed<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">7200</span><span style="color: #008000;">&#41;</span>,
        sata<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#41;</span></pre></div></div>

</td>
<td>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">ComputerT computer<span style="color: #008000;">&#40;</span>ProcessorT, DiskT<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
ProcessorT processor<span style="color: #008000;">&#40;</span>CoresT, SpeedT, ArchT<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
DiskT disk<span style="color: #008000;">&#40;</span>SizeT<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

</td>
</tr>
</table>
<p>    </center>
</li>
<li><b>Optional List.</b> As long as each list can be clearly expressed, we can achieve an optional list via function overloading. This way the compiler will match up directly to the instance of the correct rule. However, this amount of explicitness comes at a cost: either we keep the number of optional choices in the grammar to a minimum, or we commit ourselves to writing a large number of very similar functions (as the optionality can explode exponentially). If optionality is given in the grammar as a grouping, then it may make sense to create a new function representing only that grouping. This can reduce the amount of function overloading by reducing the number of function signatures that need to be supported. Fowler recommends two options:
<p>    <center></p>
<table width="100%">
<tr>
<th>Method Chaining</th>
<th>Literal Map</th>
</tr>
<tr>
<td>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">computer<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    .<span style="color: #007788;">processor</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
        .<span style="color: #007788;">cores</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span>,
        .<span style="color: #007788;">speed</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2500</span><span style="color: #008000;">&#41;</span>,
        .<span style="color: #007788;">i386</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    .<span style="color: #007788;">disk</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
        .<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">150</span><span style="color: #008000;">&#41;</span>
    .<span style="color: #007788;">disk</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
        .<span style="color: #007788;">size</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">75</span><span style="color: #008000;">&#41;</span>
        .<span style="color: #007788;">speed</span><span style="color: #008000;">&#40;</span><span style="color: #0000dd;">7200</span><span style="color: #008000;">&#41;</span>
        .<span style="color: #007788;">sata</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    .<span style="color: #007788;">end</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span></pre></div></div>

</td>
<td>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">computer<span style="color:#006600; font-weight:bold;">&#40;</span>
    processor<span style="color:#006600; font-weight:bold;">&#40;</span>
        <span style="color:#ff3333; font-weight:bold;">:cores</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">2</span>,
        <span style="color:#ff3333; font-weight:bold;">:speed</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">2500</span>,
        <span style="color:#ff3333; font-weight:bold;">:type</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:i386</span>
    <span style="color:#006600; font-weight:bold;">&#41;</span>,
    disk<span style="color:#006600; font-weight:bold;">&#40;</span>
        <span style="color:#ff3333; font-weight:bold;">:size</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">150</span>
    <span style="color:#006600; font-weight:bold;">&#41;</span>,
    disk<span style="color:#006600; font-weight:bold;">&#40;</span>
        <span style="color:#ff3333; font-weight:bold;">:size</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">75</span>,
        <span style="color:#ff3333; font-weight:bold;">:speed</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006666;">7200</span>,
        <span style="color:#ff3333; font-weight:bold;">:interface</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#ff3333; font-weight:bold;">:sata</span>
    <span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">&#41;</span></pre></div></div>

</td>
</tr>
</table>
<p>    </center></p>
<p>Although the Method Chaining approach readily allows for the assumption of default parameters (for example the speed of the first disk), the hierarchical structure here is not fully captured by the chain of method calls. To support such a chain as that in the example, it would be necessary to hold a ContextVariable that each method automatically recognizes. (for example, so that <code>size()</code> can refer to the last <code>disk</code> inserted. Because this context is kept implicit, the compiler will not be able to capture an invalid sequence of method calls (instead this much be caught by validity checking code at runtime). Furthermore, it is important to keep straight the terminology of each different component. For example, it is not clear that <code>speed(7200)</code> should refer to the last mentioned disk or to the last mentioned processor. Reshuffling the order of method calls can be highly problematic, and can result in unwanted side-effects if terminology becomes mixed up.</p>
<p>The Literal Map approach works pretty well for dynamic languages (python and ruby) which allow functions to accept a dictionary mapping terms to values. It would be nice to have named parameters in more of the static languages (Ada and C# have them, C/C++ and Java do not).  Having a statically type-checked language implement this feature, including the ability to overload with default values, would be ideal. There is also nothing that preserves the order of entries (which isn&#8217;t all that important in this example).
</li>
<li><b>Homogeneous Bag.</b> This is directly about modeling the <b>repetition</b> rule in Wirth&#8217;s syntax. In most programming languages, there do not exist easily used mechanisms for passing a <em>variable length list</em> of items.<br />
    <center></p>
<table width="100%">
<tr>
<th>Literal List</th>
<th>Function Sequence</th>
</tr>
<tr>
<td>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">martin.<span style="color: #007788;">follows</span><span style="color: #008000;">&#40;</span>
    <span style="color: #FF0000;">&quot;WardCunningham&quot;</span>,
    <span style="color: #FF0000;">&quot;bigballofmud&quot;</span>,
    <span style="color: #FF0000;">&quot;KentBeck&quot;</span>,
    <span style="color: #FF0000;">&quot;neal4d&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

</td>
<td>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">computer<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    processor<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        cores<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        speed<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2500</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        i386<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    disk<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        size<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">150</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    disk<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        size<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">75</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        speed<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">7200</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        sata<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

</td>
</tr>
</table>
<p>    </center></p>
<p>The Literal List approach can be implemented, in languages such as C, with the varargs mechanism. But it forgoes type-checking, and definitely doesn&#8217;t feel &#8216;built-in&#8217;. Dynamic languages are more permissive in this regard. For example, Python has the <code>*args</code> mechanism, and JavaScript allows access to the <code>arguments</code> array. In either case, it turns out, aside from creating, explicitly, your own separate list to pass in, there just really isn&#8217;t a great way to implement this pattern. All is not quite lost, however, as some language allow for the in-line initialization of arrays. For example,</p>
<ul>
<li>C++ allows passing of a streamed object: <code>m(StringArray() &lt;&lt; "blah" &lt;&lt; "hey" &lt;&lt; "yo")</code>, which isn&#8217;t quite what we want.</li>
<li>Java does a bit better with <code>m(new String[]{"blah", "hey", "yo"});</code> which sacrifices readability, but achieves the desired goal.</li>
<li>Java also allows declaration of a type-checked varargs <code>m(String...);</code> which is called like <code>m("blah", "hey", "yo")</code>, which is precisely what we want.</li>
</ul>
<p>The biggest drawback of using the varargs mechanism is that it lacks sequential composability. You cannot easily declare a function for the rule: <code>A := B* C*</code>. In this case, the more crufty, in-line instantiation mechanisms turn out to be better.</p>
<p>The Function Sequence approach lacks pretty much everything I&#8217;ve considered thus far: it&#8217;s sequencing isn&#8217;t type-checked, terminology can be confused, it requires a State Variable to maintain context. The only way in which it isn&#8217;t worse than Method Chaining is that the compiler can tell you on what line you crashed, provided you responsibly littered your code with <code>assert</code>&#8216;s.
</li>
<li><b>Heterogeneous Bag.</b> Because this is more about <b>repetition</b> of <b>optionality</b>, it&#8217;s probably best handled by combining previous approaches. For example, you could choose to factor out the group as it&#8217;s own separate type, and then have a vararg listing of these types. Method Chaining can be used for list of configuration parameters, especially if order isn&#8217;t really the concern. You couldn&#8217;t implement this with a dictionary or map though, because of the repetition aspect.
</li>
<li><b>Set.</b> Clearly the only approach to implementing an unordered collection of possibly heterogeneous types is a dictionary or associative array, best represented in the the Literal Map approach. Again, we&#8217;d like to have a clean syntax for this in our programming languages: type-checked keyword arguments.
</ul>
<p>For me, the most important aspect of this exposition has been the progressive realization that type-checked keyword arguments and inline-initialization are necessary elements to supporting a mini-DSL within your own programming language. Without these features, syntax too easily obscures what you are tying to express. It&#8217;s probably worth complicating both the semantics and parsing of the host language to provide these elements.</p>
<p>We also find that, using function calls as our implementation interface, the most difficult aspect of Wirth&#8217;s syntax is <b>repetition</b>. All the other rules fit very nicely into existing function interfaces. It would be very nice for more languages to support varargs in the way that Java has. However, I would like for this to be extended a bit further: it should be possible to have multiple vararg arguments, as long as they are type-distinguishable. For example, <code>m(String..., Integer...)</code>, should be callable with <code>m("a", "b", "c", 3, 1, 4, 1, 5, 9)</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2011/05/18/a-language-should-be-focused-on-writing-internal-dsls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Considerations of Programming Language Design</title>
		<link>http://www.cogitolingua.net/blog/2011/05/11/considerations-of-programming-language-design/</link>
		<comments>http://www.cogitolingua.net/blog/2011/05/11/considerations-of-programming-language-design/#comments</comments>
		<pubDate>Thu, 12 May 2011 06:27:55 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Comp*]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=517</guid>
		<description><![CDATA[<p>Reddit modded up a nice review of Considerations When Designing your Own Programming/Scripting Language (it&#8217;s worth following the links provided there to Clementson&#8217;s Blog, to get a larger picture of the issue).</p> <p>There&#8217;s really a ton of stuff to think about. Mostly the field of computer science concerns itself with taming complexity. All too often [...]]]></description>
			<content:encoded><![CDATA[<p>Reddit modded up a nice review of <a href="http://www.redmountainsw.com/wordpress/archives/considerations-when-designing-your-own-programmingscripting-language">Considerations When Designing your Own Programming/Scripting Language</a> (it&#8217;s worth following the links provided there to Clementson&#8217;s Blog, to get a larger picture of the issue).</p>
<p>There&#8217;s really a ton of stuff to think about. Mostly the field of computer science concerns itself with taming complexity. All too often software projects buckle under the weight of feature creep or code bloat. Eventually, in the career of any decent programmer, the ability to clearly see and identify the common organizational patterns becomes a core focus. Anyone with exposure to several different languages, notices that these organizational patterns are expressed differently among the babble, that even the same old problems (i.e. parsing, printf, or search/logic) can be cast in completely different light by a shift in linguistic perspective.</p>
<p>We see that language design suffers exactly the same problems as other domains. But, because it is a language, botching your solution to these problems, will affect the <em>way</em> that programmers solve their problems. Once such non-trivial task that involves many sub-problems which can clearly bring to light these issues is that of writing a compiler. Compilers involve a great deal of data structures and algorithms: text-manipulation, parsing, trees, graphs, fixed-point algorithms (dataflow analysis), NP-complete problems (register allocation, instruction scheduling). From this simple observation, we see that we should follow Wirth&#8217;s language-complexity metric: that languages should be compared based on the relative sizes of their self-compilers. (It really is an elegant fixed-point metric to the circularity involved in the complexity-of-expression/complexity-of-implementation trade-off).</p>
<p>So, to highlight these ideas, let&#8217;s take a particularly poor specimen: C++. This linguistic monstrosity is both verbose and inconvenient. Yes, it let&#8217;s you get close to the &#8216;bare-metal&#8217;, and it requires you to really think about what you are doing. (to a fault actually: I&#8217;ve noticed that the more C++ I learn the more tricky and horrendous my bugs become). But, even though I use it most often, I can&#8217;t help but feel that it score&#8217;s incredibly poor in the design space. The features don&#8217;t interact well together: inheritance vs templates, memory management vs exceptions, etc. The lower-level exposure consistently prevents higher-level conveniences. For example, pointer arithmetic prevents garbage collection (although garbage collection was proposed for C++0x it got thrown out at the last minute, hopefully it will be introduced eventually. I&#8217;m not sure I want to know what horrible contortions of logic must be followed to actually implement it.) Finally, it should be noted, that the complexity of implementing C++ in C++ is incredibly daunting.</p>
<p>So, designing a language is incredibly difficult. The implementation details of certain features can easily wreak havoc on the design. What&#8217;s it take to implement closures? Do you want a fast linear stack, or would you like to have continuations? Do you want type-safety or does that drive you to drop first-class functions out of implementor&#8217;s laziness? Nor do the features always work well together, as demonstrated by the C++ trade-offs above. Yes, before designing your own language, take great heed of the accumulated wisdom within that introductory post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2011/05/11/considerations-of-programming-language-design/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Method Extensions</title>
		<link>http://www.cogitolingua.net/blog/2010/12/24/method-extensions/</link>
		<comments>http://www.cogitolingua.net/blog/2010/12/24/method-extensions/#comments</comments>
		<pubDate>Sat, 25 Dec 2010 06:51:31 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Comp*]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=497</guid>
		<description><![CDATA[<p>I&#8217;ve come across another programming language feature that I would like to have. The last one was a bit outlandish, and I&#8217;d really like to refine it a bit. Dress it up a little.</p> <p>Supposing you were asked to perform the well known Dijkstra&#8217;s Algorithm. Someone hands you a graph of generic items, let&#8217;s call [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve come across another programming language feature that I would like to have. The <a href="http://www.cogitolingua.net/blog//2010/11/07/interface-extension">last one</a> was a bit outlandish, and I&#8217;d really like to refine it a bit. Dress it up a little.</p>
<p>Supposing you were asked to perform the well known <a href="http://en.wikipedia.org/wiki/Dijkstra's_algorithm">Dijkstra&#8217;s Algorithm</a>. Someone hands you a graph of generic items, let&#8217;s call them <code>Node</code>s, and you know that each <code>Node</code> has a List&lt;Node*&gt; of edges. So far, so good.</p>
<p>Now, Dijkstra&#8217;s Algorithm works pretty well if you are allowed to tag or mark the Nodes in some way. If you have control over the Node class, then you might be tempted to &#8216;gift&#8217; the class with a member field to hold this tag. But, you should resist this temptation, because it&#8217;s not really appropriate.</p>
<p>It will pollute your class. You are increasing the size of each and every Node, regardless of wether it participates in Dijkstra&#8217;s Algorithm. The additional field, if not commented, might confuse you later. If you&#8217;re paranoid you might never touch it again; even after removing Dijkstra&#8217;s Algorithm from your codebase.  Even if commented, you might later hijack the field for use in other algorithms for which marking or tagging is convenient. Cleverly using the same field for dual missions, and later confusing yourself or getting into an inconsistent state.</p>
<p>So, we don&#8217;t want to make an additional field. Fine, what other options are there?</p>
<p>We could subclass <code>Node</code> to make a <code>MarkableNode</code>. But, in the implementation given above, we see that <code>Node</code>s point only to other <code>Node</code>s. To really give a seamless touch to the algorithm, we&#8217;d have to make a complete copy of the graph we were given, out of the <code>MarkableNode</code>s.</p>
<p>Alternatively, We could write a wrapper class, <code>MarkableNode</code> that wraps a <code>Node*</code>. This will work generically, yet you have to write a ton of other features for equality tests, worry about creating two different <code>MarkableNodes</code> around the same <code>Node*</code>, etc. What a crapload of tedious busywork.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">class</span> MarkedNode
<span style="color: #008000;">&#123;</span>
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
    MarkedNode<span style="color: #008000;">&#40;</span>Node <span style="color: #000040;">*</span>node<span style="color: #008000;">&#41;</span>
        <span style="color: #008080;">:</span>m_node <span style="color: #000080;">=</span> node<span style="color: #008080;">;</span>
        ,m_mark <span style="color: #000080;">=</span> <span style="color: #0000ff;">false</span><span style="color: #008080;">;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">void</span> mark<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        m_mark <span style="color: #000080;">=</span> <span style="color: #0000ff;">true</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">private</span><span style="color: #008080;">:</span>
    Node <span style="color: #000040;">*</span>m_node<span style="color: #008080;">;</span>
    <span style="color: #0000ff;">bool</span> m_mark<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>How &#8217;bout this last option. We could temporarily attach to <code>Node*</code> a method (and storage) for marking the nodes. This method would only exist in the scope that it&#8217;s needed, so it won&#8217;t pollute the <code>Node</code> class. It would behave like a subclass or a wrapper class, but will need to be syntactically lighter weight, and semantically the same as having a <code>Node*</code> except for the specific extensions for marking.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;">Class MarkedNode <span style="color: #008080;">:</span> extends Node<span style="color: #000040;">*</span>
<span style="color: #008000;">&#123;</span>
    m_marked <span style="color: #000080;">=</span> <span style="color: #0000ff;">false</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
    <span style="color: #0000ff;">void</span> mark<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        m_marked <span style="color: #000080;">=</span> <span style="color: #0000ff;">true</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>So, essentially we can call this method on any <code>Node*</code>, and we allow the compiler to figure out how it will map (1-to-1) each <code>m_marked</code> field to each <code>Node*</code>. We also let the compiler figure out how to mangle the <code>mark()</code> method. Doing the mapping of <code>Node*</code> to extended data will be tricky, especially if we want to clear out the data when a <code>Node*</code> is deleted.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2010/12/24/method-extensions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of the Web should be Lisp</title>
		<link>http://www.cogitolingua.net/blog/2010/12/21/the-future-of-the-web-should-be-lisp/</link>
		<comments>http://www.cogitolingua.net/blog/2010/12/21/the-future-of-the-web-should-be-lisp/#comments</comments>
		<pubDate>Tue, 21 Dec 2010 08:25:25 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Punditry]]></category>
		<category><![CDATA[Tech*]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=424</guid>
		<description><![CDATA[<p>I was reading Steve Yegge&#8217;s drunken rant on The Emacs Problem. It wasn&#8217;t able to convince me that Lisp was a great language for text processing, but it did convince me that Lisp is a fantastic language for data interchange. Especially, if that data happens to have hierarchical structure. Say for example, something like HTML.</p> [...]]]></description>
			<content:encoded><![CDATA[<p>I was reading Steve Yegge&#8217;s drunken rant on <a href="http://sites.google.com/site/steveyegge2/the-emacs-problem">The Emacs Problem</a>. It wasn&#8217;t able to convince me that Lisp was a great language for text processing, but it did convince me that Lisp is a fantastic language for data interchange. Especially, if that data happens to have hierarchical structure. Say for example, something like HTML.</p>
<p>Steve was kind enough to point out a really nice XML logfile example, which I reproduce here:</p>
<table>
<tr>
<td>
<pre>&lt;<font color="#0000ff">?</font><font color="#b22222">xml</font> <font color="#a0522d">version</font>=<font color="#0000ff">"</font>1.0<font color="#0000ff">"</font> <font color="#a0522d">encoding</font>=<font color="#0000ff">"</font>utf-8<font color="#0000ff">"</font> <font color="#a0522d">standalone</font>=<font color="#0000ff">"</font>no<font color="#0000ff">"</font><font color="#0000ff">?&gt;</font>
&lt;<font color="#0000ff">!</font><font color="#b22222">DOCTYPE</font> log <font color="#b22222">SYSTEM</font> <font color="#0000ff">"</font>logger.dtd<font color="#0000ff">"&gt;</font>
&lt;<font color="#b22222">log&gt;</font>
&lt;<font color="#b22222">record&gt;</font>
  &lt;<font color="#b22222">date&gt;</font>2005-02-21T18:57:39&lt;<font color="#b22222">/</font><font color="#b22222">date&gt;</font>
  &lt;<font color="#b22222">millis&gt;</font>1109041059800&lt;<font color="#b22222">/</font><font color="#b22222">millis&gt;</font>
  &lt;<font color="#b22222">sequence&gt;</font>1&lt;<font color="#b22222">/</font><font color="#b22222">sequence&gt;</font>
  &lt;<font color="#b22222">logger&gt;</font><font color="#0000ff">&lt;</font><font color="#b22222">/</font><font color="#b22222">logger&gt;</font>
  &lt;<font color="#b22222">level&gt;</font>SEVERE&lt;<font color="#b22222">/</font><font color="#b22222">level&gt;</font>
  &lt;<font color="#b22222">class&gt;</font>java.util.logging.LogManager$RootLogger&lt;<font color="#b22222">/</font><font color="#b22222">class&gt;</font>
  &lt;<font color="#b22222">method&gt;</font>log&lt;<font color="#b22222">/</font><font color="#b22222">method&gt;</font>
  &lt;<font color="#b22222">thread&gt;</font>10&lt;<font color="#b22222">/</font><font color="#b22222">thread&gt;</font>
  &lt;<font color="#b22222">message&gt;</font>A very very bad thing has happened!&lt;<font color="#b22222">/</font><font color="#b22222">message&gt;</font>
  &lt;<font color="#b22222">exception&gt;</font>
    &lt;<font color="#b22222">message&gt;</font>java.lang.Exception&lt;<font color="#b22222">/</font><font color="#b22222">message&gt;</font>
    &lt;<font color="#b22222">frame&gt;</font>
      &lt;<font color="#b22222">class&gt;</font>logtest&lt;<font color="#b22222">/</font><font color="#b22222">class&gt;</font>
      &lt;<font color="#b22222">method&gt;</font>main&lt;<font color="#b22222">/</font><font color="#b22222">method&gt;</font>
      &lt;<font color="#b22222">line&gt;</font>30&lt;<font color="#b22222">/</font><font color="#b22222">line&gt;</font>
    &lt;<font color="#b22222">/</font><font color="#b22222">frame&gt;</font>
  &lt;<font color="#b22222">/</font><font color="#b22222">exception&gt;</font>
&lt;<font color="#b22222">/</font><font color="#b22222">record&gt;</font>
&lt;<font color="#b22222">/</font><font color="#b22222">log&gt;</font>
</pre>
</td>
<td>
<pre>(<font color="firebrick">log</font>
'(<font color="firebrick">record</font>
  (<font color="firebrick">date</font> <font color="RoyalBlue">"2005-02-21T18:57:39"</font>)
  (<font color="firebrick">millis</font> 1109041059800)
  (<font color="firebrick">sequence</font> 1)
  (<font color="firebrick">logger</font> nil)
  (<font color="firebrick">level</font> <font color="red">'SEVERE</font>)
  (<font color="firebrick">class</font> <font color="RoyalBlue">"java.util.logging.LogManager$RootLogger"</font>)
  (<font color="firebrick">method</font> <font color="DarkGreen">'log</font>)
  (<font color="firebrick">thread</font> 10)
  (<font color="firebrick">message</font> <font color="RoyalBlue">"A very very bad thing has happened!"</font>)
  (<font color="firebrick">exception</font>
    (<font color="firebrick">message</font> <font color="RoyalBlue">"java.lang.Exception"</font>)
    (<font color="firebrick">frame</font>
      (<font color="firebrick">class</font> <font color="RoyalBlue">"logtest"</font>)
      (<font color="firebrick">method</font> <font color="DarkGreen">'main</font>)
      (<font color="firebrick">line</font> 30)))))
</pre>
</td>
</tr>
</table>
<p>What&#8217;s super-amazingly-awesome about this transformation is three-fold:</p>
<ol>
<li>The transformation is structure-preserving.</li>
<li>The syntax is orders of magnitude simpler.</li>
<li>The tags can be interpreted as Lisp functions.</li>
</ol>
<p>Let&#8217;s focus on these in more detail:<br />
Because the transformation is structure-preserving, the transition is theoretically achievable. The Web is currently a huge stinking polyglot of HTML, XML, XHTML, JavaScript, XHTTPRequest. It&#8217;s been festering, and each time someone scratches an itch, we all have to deal with that solution and it&#8217;s interactions with existing technologies. Right now we have a huge Tower of Babal, and the web browser has to support it all!</p>
<p>In my work as a web security researcher, I&#8217;ve come to the conclusion that the root of code injection attacks is precisely this polyglot monstrosity! If the web had a simpler, unified syntax for all it&#8217;s technologies, many of these problems would go away, and the remaining ones could be more easily mitigated. No more special cases, means less buggy code, fewer opportunities for things to go wrong, and a lower profile exposed to attacks.</p>
<p>Finally, because we&#8217;ve encoded the HTML data as a set of Lisp lists: the document can easily become self-modifying! HTML was envisioned to hold static documents, and roughly describe their structure to a browser that would render it. This worked well back in the early days, when all we had was some ascii pr0n, Star Trek lore, and home pages of CERN employees. But over time, as more people started using the web, we craved more exciting things. For example all the people on Geocities wanted that &lt;blink&gt; tag that made we want to scratch out my own eyes by prevented that by triggering an epileptic fit.</p>
<p>Eventually, businesses got in on the action. And they had frighteningly different demands: they wanted more automation, they wanted glitz that would attract users. It wasn&#8217;t enough to have a server-side script create and deliver a page based on what&#8217;s currently present in the inventory database. No! What they wanted was User Interaction. How do you make HTML more dynamic? You have to give it the ability to self-modify. But HTML isn&#8217;t a programming language, it&#8217;s a document layout language!</p>
<p>Enter JavaScript. Netscape (now Mozilla) birthed a language that would allow HTML pages to self-modify and self-introspect, and respond to user interactions. People, Businesses, Everybody just ate it up. There&#8217;s more JavaScript now than any other language.</p>
<p>Not only has the introduction of JavaScript compounded the polyglot problem, it introduced a whole new class of security risks because of the self-modifying capabilities. Now, your browser actually downloads code from anywhere on the web, and happily executes it. Demand for dynamic content so overwhelmed folks at the time that nobody seriously questioned the security risks! A very common attack nowadays is the XSS attack. If you can find a way to get JavaScript onto a page (say by posting it on a web forum) then you can take control of every browser that sets eyes on that page. This was how <a href="http://namb.la/popular/tech.html">Samy (my Hero!) took out MySpace</a>.</p>
<p>I&#8217;m not going to spend any more space here arguing against the idea of a self-modifying document. It&#8217;s way too late for that. AJAX applications like maps and mail are way too useful.</p>
<p>Let&#8217;s look at where the web is currently headed. I&#8217;ve heard about using web architecture as a service. I&#8217;ve heard about it using it for application delivery. The natural extension here is that your browser becomes the next operating-system in a box. But is this nasty polyglot, ad-hoc model of languages and their unholy spaghetti of interaction really the way to achieve that? It&#8217;s probably going to happen anyway. We can already see how: nobody writes HTML and JavaScript now, it&#8217;s all machine generated. Generated by Rails, and other web frameworks. When machine architectures became too tedious, we stopped programming assembly. We let the compilers figure it out. Now that web programming has become a nightmare, we turn to the frameworks to save us. Let the frameworks figure it out. The frameworks have become the compilers of the web.</p>
<p>But if you are really just going to machine generate so much&#8230; Why stick with the crufty interfaces? Why not replace it with, what I now consider the best data-interchange format of all time? Is Lisp really that bad?</p>
<p>There&#8217;s one other feature that Steve mentioned in his article that I haven&#8217;t addressed yet. Suppose that we decide to replace that HTML with Lisp, then what? How do we get back those dynamic pages? Well, look at that example again. Go on, look. I&#8217;ll wait.</p>
<p>It&#8217;s in Lisp. That means it&#8217;s potentially executable. Each of those entries, log, record, date, etc&#8230; can be a Lisp function. For HTML, we&#8217;d have the DOM structure, and each item in it would be executable. Some convenient hooks into the renderer, and your Lispified HTML renders itself! Another hook, say for the script tag, and your document becomes self-modifiable! We&#8217;re missing none of the dynamic content, just making it easier to parse and manipulate. I think if we switched we could build cathedrals on this stuff!</p>
<p>So please! What the web really needs is for this hideous architectural and syntactic nightmare to be slain like the monster it&#8217;s become! Since HTML really started as a document encoding format that focused on hierarchical structure, there&#8217;s no reason we can&#8217;t switch this to Lisp, like in Yegge&#8217;s logfile example. We lose none of the structure, and gain in simplified syntax. We loose none of the functionality, and gain enormously in our ability to parse, manipulate, transform the document. Further, since Lisp is so elegant, we can also do more of the analyses required for securing, optimizing, and jit-compiling.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2010/12/21/the-future-of-the-web-should-be-lisp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cultural and Linguistic Sexism</title>
		<link>http://www.cogitolingua.net/blog/2009/07/26/cultural-and-linguistic-sexism/</link>
		<comments>http://www.cogitolingua.net/blog/2009/07/26/cultural-and-linguistic-sexism/#comments</comments>
		<pubDate>Sun, 26 Jul 2009 23:53:00 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Politics]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=236</guid>
		<description><![CDATA[<p>I was working on a paper today, and noticed some very peculiar about linguistic gender-neutrality. I know that we are all encouraged to use female character in our examples to combat the inherent chauvinism of the English language. Despite following the recent gender politics over at Less Wrong (summarized by a post on The Nature [...]]]></description>
			<content:encoded><![CDATA[<p>I was working on a paper today, and noticed some very peculiar about linguistic gender-neutrality. I know that we are all encouraged to use female character in our examples to combat the inherent chauvinism of the English language. Despite following the recent gender politics over at <a href="http://lesswrong.com/">Less Wrong</a> (summarized by a post on <a href="http://lesswrong.com/lw/13s/the_nature_of_offense/">The Nature of Offense</a>) and hearing Douglas Hofstadter explore the topic in some of his work, I&#8217;m still not entirely convinced that enough women feel alienated when males are used in examples.</p>
<p>Nevertheless, the issue has been raised to the level of my awareness, and I&#8217;m now sensitive to it. So I find myself in a huge conundrum, as my example involves shopping. So, I&#8217;m a bigoted sexist no matter what gender I choose!</p>
<p>Damn, English is sucks!</p>
<p>P.S. Assuming that example involving idiots and geniuses occur with equal frequency, what exactly stops people from using one gender as the canonical example for idiocy and the other gender for genius? Of course this association can be carried as far as one&#8217;s bigotry allows.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2009/07/26/cultural-and-linguistic-sexism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building Linguistic Structure</title>
		<link>http://www.cogitolingua.net/blog/2009/06/04/building-linguistic-structure/</link>
		<comments>http://www.cogitolingua.net/blog/2009/06/04/building-linguistic-structure/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 07:54:10 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Mind/Cognition]]></category>
		<category><![CDATA[Punditry]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=220</guid>
		<description><![CDATA[<p>Yesterday, I had an interesting thought. My advisor once made the cultural observation that many people in Computer Science invent their own language and then immediately write a self-hosting compiler. I agree that a compiler is quite a feat of engineering and serves as a nice test case to demonstrate that the language you&#8217;ve invented [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, I had an interesting thought. My advisor once made the cultural observation that many people in Computer Science invent their own language and then immediately write a self-hosting compiler. I agree that a compiler is quite a feat of engineering and serves as a nice test case to demonstrate that the language you&#8217;ve invented is powerful enough that it can handle real-world complexity. Unfortunately, this test fails in a few important ways.</p>
<p>First, It doesn&#8217;t actually show as much as you think it might. There is a very strong filter on failed languages. By using this test the author runs the risk of re-designing the language, specifically to insert constructs that help them build the compiler. Now, this isn&#8217;t necessarily a bad thing, except that compiler writing is now a fairly mature field. There are standard abstractions (esp. in the lexing and parsing) that a new language will probably not experiment with. So, the author will usually just build these existing and well-understood abstractions into the new language. Rather than encouraging language experimentation we get more of the same, but with different syntax.</p>
<p>Second, Not all useful languages even have their own compiler. I&#8217;m specifically thinking of the domain specific languages (DSL). Nobody would write an awk interpreter in awk; or a mail engine using sendmail (even if it is Turing Complete). These are languages designed to do a specific task, many of them are quite essential to their respective fields, but none of them are self-hosting. Nor should we expect them to be.</p>
<p>My argument here is that the cultural practice of writing a self-hosting compiler is a big distraction. New languages should be for experimenting with new linguistic constructs. We should be looking toward the DSLs, and incorporating their innovations into our more main-stream languages. Right now, we seem to be optimizing our languages for compiler construction.</p>
<p>I&#8217;d rather see our languages evolve in a different direction. I&#8217;m really eager to witness the birth of an AI. For this to happen though, we need languages for expressing patterns of thought, not patterns of bits. We need the ability to cohesively and flexibly assemble the stuff of thought. I&#8217;m thinking Society of Mind stuff here. We need languages that allow for statistical fuzziness, sloppy associativity, and the ability to construct metaphor.</p>
<p>The linguistic tools that we find useful for building compilers are not necessarily the same tools that will help us build a mind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2009/06/04/building-linguistic-structure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computer Language Comparison</title>
		<link>http://www.cogitolingua.net/blog/2009/05/31/computer-language-comparison/</link>
		<comments>http://www.cogitolingua.net/blog/2009/05/31/computer-language-comparison/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 00:05:15 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Comp*]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=218</guid>
		<description><![CDATA[<p>Guillaume Marceau has used data from the Computer Language Benchmark Game to provide a graphical comparison of many different languages.</p> <p> If you drew the benchmark results on an XY chart you could name the four corners. The fast but verbose languages would cluster at the top left. Let&#8217;s call them system languages. The elegantly [...]]]></description>
			<content:encoded><![CDATA[<p>Guillaume Marceau has used data from the <a href="http://shootout.alioth.debian.org/">Computer Language Benchmark Game</a> to provide a <a href="http://gmarceau.qc.ca/blog/2009/05/speed-size-and-dependability-of.html">graphical comparison</a> of many different languages.</p>
<blockquote><p>
If you drew the benchmark results on an XY chart you could name the four corners. The fast but verbose languages would cluster at the top left. Let&#8217;s call them system languages. The elegantly concise but sluggish languages would cluster at the bottom right. Let&#8217;s call them script languages. On the top right you would find the obsolete languages. That is, languages which have since been outclassed by newer languages, unless they offer some quirky attraction that is not captured by the data here. And finally, in the bottom left corner you would find probably nothing, since this is the space of the ideal language, the one which is at the same time fast and short and a joy to use.
</p></blockquote>
<p>Of course the C compilers do a very good job on performance, but seem to do average on verbosity (better than I expected). Haskell (ghc) does a surprisingly nice job. I wish that I&#8217;d thought to do this kind of visualization, it&#8217;s really pretty neat. The only improvement that I could think of, would be to do the performance axis on a logarithmic scale.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2009/05/31/computer-language-comparison/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automatic Thesaurus</title>
		<link>http://www.cogitolingua.net/blog/2009/05/07/automatic-thesaurus/</link>
		<comments>http://www.cogitolingua.net/blog/2009/05/07/automatic-thesaurus/#comments</comments>
		<pubDate>Fri, 08 May 2009 03:31:30 +0000</pubDate>
		<dc:creator>erich</dc:creator>
				<category><![CDATA[Ideas]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://www.cogitolingua.net/blog/?p=191</guid>
		<description><![CDATA[<p>Last week, I landed on another PhD worthy research project.</p> <p> Given a very large corpus of sentences, such as a digitized version of the Library of Congress, or a less noisy version of the Internet, how can you automatically generate a Thesaurus? </p> <p>At first I thought the problem should be fairly easy, but [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, I landed on another PhD worthy research project.</p>
<blockquote><p>
Given a very large corpus of sentences, such as a digitized version of the Library of Congress, or a less noisy version of the Internet, how can you automatically generate a Thesaurus?
</p></blockquote>
<p>At first I thought the problem should be fairly easy, but the more I thought about it the more difficult and daunting the task became. For example, as a first approach, we might assume that textual substitution would be a good proxy for identifying synonymous terms. That is if a couple terms really are synonymous, then they ought to be substitutable for each other in a sentence. With a large enough set of sentences, we should be able to identify such situations, and thereby bootstrap the building of the thesaurus. But there&#8217;s a small problem, provided by my good friend <a href="http://www.qsl.net/kc6uut/">EvB</a>:</p>
<blockquote><p>
The sky is blue.<br />
The ocean is blue.
</p></blockquote>
<p>But sky is not the same as ocean. Sure they are similar. A poet could compose a nice metaphor of fish swimming through their sky above the bottom feeders. But this metaphorical relationship isn&#8217;t one that would necessarily make it into a human compile thesaurus. So, textual substitution can easily lead us astray.</p>
<p>Continuing with EvB&#8217;s particularly good example we can also identify another problem. Suppose that we incorporate a bit of natural language understanding, enough to pull out parts of speech. Then, the system would easily identify the equation of sky with blue, or ocean with blue. But neither of these statements is true either. Usually people take the example to mean not that sky and blue are the same thing, but that the sky belongs to the set of objects that have a property called color, the value of which is blue. So this understanding depends on what the definition of &#8216;is&#8217; is (obviously not a simple affair). We also would like to avoid drawing a relationship between any of the pronouns and the rest of the language.</p>
<p>Next lets look at how people tend to write. Any good library is gonna be full of metaphor, simile, pun, allusion, word play, sound play, and other such highly nuanced expression. All of these things will trump any reasonably simple attempt at drawing a link between synonymous words. Political propaganda and polemic, will probably be particularly bad at equating terms that should probably be kept logically distinct. Furthermore, at least when I write, I&#8217;m reminded of other things during the process, things that are associated, but not necessarily synonymous.  These remindings are an important part of the <a href="http://www.paulgraham.com/essay.html">essay writing process</a>, but will certainly throw noise into the digital library.</p>
<p>But if it&#8217;s so hard to make a mechanical system for identifying synonyms, then how do humans do it? Here I have a hypothesis: that similar words stimulate similar patterns in the brain. Thus when a human tries to think up synonyms it&#8217;s really the same as playing word association with a filter. First, the word stimulates the brain, bring up certain associations. These associations will be based on &#8216;brain distance&#8217;, a measure of the similarity of brain activity for certain words and thoughts. But some associations will be radically different from the synonyms that we&#8217;re looking for. For example, antonyms and non-sequiturs often come up in word association games. So a filter is applied to weed these out, and what&#8217;s left is passed through a dictionary/meaning check. Anything passing this process will be reported as a synonymous term.</p>
<p>So, in order to really generate a thesaurus, we do need AI (or at least an underlying cognitive model). When I first thought of the thesaurus problem, I was hoping that it was paired down enough, small and simple enough that it would be doable without all this complexity. We might have to reduce the problem further, make it looser. Say, build an association dictionary, rather than a thesaurus. An association dictionary might be possible, because it forgoes the understanding of meaning and similarity, it doesn&#8217;t have to question or measure why two words should be associated, only record that they are used similarly.</p>
<p>So, if you can automate the building of a thesaurus, you should get a PhD in Linguistics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cogitolingua.net/blog/2009/05/07/automatic-thesaurus/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

