Information Flow – eric the fruitbat

Segregate Third-Party JS Libraries

erich — Mon, 12 Mar 2012 20:50:54 +0000

Typically, web authors simply load whatever library they’d like to use with full trust. In JS, such loading amounts essentially to a #include. I’m flabbergasted that this practice remains normal. It could be paranoia, but even without invoking all the security concerns, I’d be reluctant to include other people’s code simply because of the potential for a naming conflict on a global variable.

But, recently, in reviewing a new paper about JS information flow security, I had an interesting thought. (I should credit this thought to my co-worker, Christoph Kerschbaumer, because it was his phrasing of a bullet point that made me question existing practice.) Why don’t we use iframes to segregate the included code?

Currently, that can’t be done because the inter-iframe communication channel is hideous: uses fragment identifiers in a url to re-navigation a frame, which continuously polls the document url looking for that update. Of course re-navigation has it’s own security issues[Securing Frame Communication in Browsers]. This mechanism just operates too slowly and unreliably to use with a library.

Additionally, there’s a strong reason to keep iframes completely separate from each other: advertisements. Where a JS library is third-party code that you want to use, advertisement code is something that you expressly don’t want polluting namespace or accessing your variables and memory. Because of syndication, it’s also code that you shouldn’t trust in any way. iframes currently provide the best mechanism to segregate an advertisement from the rest of the page.

But! what if the iframes came equipped with a communication channel specifically meant for inter-frame communication? This channel, should of course come with a mechanism to attach security monitors so that each use can be customized for an appropriate level of security. The presence of the channel should act also as a telephone: that is, JS on either side, should be able to opt-out, and not answer a request for communication from the transmitter. The channel will need to be high-bandwidth, so that it can be used with libraries. And the segregation can use either message-passing semantics [good for memory separation (parallelism), might cost copies during communication (c.f. erlang for how to optimize this)] or reference-passing semantics [less communication overhead, bad for memory separation, potential for abuses: library might access complete data structure via referential transitivity].

I think the advantages here are pretty cool:

If the browser can execute iframes in separate memory spaces, perhaps this could be a way to have parallel code execution.
The iframe keeps untrusted JS behind a monitored firewall.
The ability to attach a security monitor to the channel should mollify most complaints about cross-iframe script contamination.
Talking to an iframe can have its own linguistic support by creating a JS object with overloaded getter/setter that wraps the communication. Calling library methods should look nearly the same as before, modulo a prefix for the interface (which is good, more like including a module).

Update Tue Mar 13 20:45:34 PDT 2012: Christoph alerted me that I have to be more careful in my reading. Indeed, Barth’s paper (cited above) actually contains exactly this proposal, as well as an implementation (adopted in WebKit), postMessage. The remaining difficulty lies in library adoption. Many JS libs now have to be rewritten to use the new communication mechanism (or clients have to care enough to write wrappers).

A New Field: Information Type Flow

erich — Mon, 30 Jan 2012 18:35:17 +0000

In my last post on Information Flow, I noticed that some flows are more informative than others. I used a switch statement for my illustrative example of that observation. But, from my experience as a software developer, I have a small aversion to switch statements. Usually, when I feel compelled to use one, it’s because I’m not using OO design principles. It’s because that switch should really be replaced with a polymorphic dispatch.

So, thanks to my Super Awesome Postdoc, we coined a new field as a result. It stands to reason that Information Flow analysis can be performed on types, just as it has previously been performed on values. A collection of questions suggest themselves:

Is it useful? Does sensitive/interesting/important information actually leak as a result of polymorphism? For a typical program, how much does the polymorphic dispatch reveal about your system? On the one hand, I think not so much; because the dispatch is taken on type-compatible instances. On the other, perhaps alot, if you represent much of the problem’s domain in the type system (AuthenticatedCustomer vs Guest). How do standard practices such as Design Patterns affect a programs information type flow?
What does the analysis entail? Does it require a statically typed language, so that you can easily identify the polymorphic call sites? Do you still have to provide instrumentation that take place at runtime?
What about dynamic languages? Does type-information make its way into dynamic programs? Even in a dynamic language, are the programmers developing as if they had a strongly typed world?

Not All Flows are Considered Equal

erich — Mon, 23 Jan 2012 19:56:24 +0000

When I was writing last post about information flow terminology, I noticed something interesting: when knowledge of control flows are used to determine the values of variables, some branches yield more information than others. Previously, I had only considered the binary if-then-else branch. Today, I shall examine a switch-case statement, which exhibits asymmetric information flow.

/* pick s from 1 to 3, inclusive */
switch(s) {
case 1: k = 1; break;
case 2: k = 2; break;
case 3: k = 2; break;
}

After this code segment has executed: k is one of two values: 1 or 2. However, one of these values is more informative for the observer. If an observer sees k == 1 then she can infer that s == 1. On the other hand, if the observer sees k == 2 then she can only infer s != 1 (or s in {2,3} with knowledge of s‘s distribution).

For control structures this simple, a static analysis could be run on the code, and provide a result graph of how influential each variable is on others. This graph could be used to quantify how much information might be leaked to an observer. Unfortunately, it comes with the limitation that each branch is considered as drawn from a uniform distribution, which may not be true of actual runtime values.

Nevertheless, I think the ability to statically quantify how much information can be leaked via control flow branches, might be useful for:

deciding which variables need tracking,
where instrumentation/wrappers/labels might be omitted,
where inspections/checks must be inserted,
finding variables which fall below a certain leak threshold.

I would really like such an analysis to tell me, for example, whether a variable holding a password is potentially leaked, and if so, how many bits. Or, to list me all the variables of my program in order of how much they might leak.

The inference here sounds like the same as in Probabilistic Predicative Programming by Eric C.R. Hehner (whom I met when Wayne Hayes invited him to give a Friday Seminar talk in Dec 2010), except that it runs backwards on the control flow.

New Terminology in Information Flow Research

erich — Fri, 20 Jan 2012 02:47:57 +0000

Information flow is about tracking the flows of information within a computer program, i.e. what values influence other values as the program executes. Denning and Denning looked at this problem in the late 1970’s [1, 2] and distinguished between flows that occur due to a data dependence (such as assignment) and flows which occur due to control dependence (such as an if-else branch). Let’s now examine each of the possible ways in which information can flow between variables in a computer.

Explicit Flows

The most direct that information can flow between variable is through an assignment statement. For example, the simple line of code: b := a causes information to be transferred a→b. However, we could also have a series of dependent assignments, in which information will flow a→b→c. This kind of indirect flow, a→c can be seen in the series of statements: b := f(..., a, ...); c := g(..., b, ...);.

Implicit Flows

Since, we are trying to assess the influence of some variables on other variables, we must also take into account control flow within the program. For example, a variable might be given a particular value only if some condition is satisfied. I’ve constructed an example of this kind of implicit flow x→y:

bool x := ...
bool y := 1
bool z := 0
if (x == 0) then
    y := 0
else
    y := 1
    z := 1

Any observer examining the value of y after the code is run combined with an a-priori knowledge (gained from a static analysis) can infer the value of x in the conditional. Lest we think that such powers of inference are not worth investigating, consider if the observer is some code that an attacker was able to inject into the system and x represents a secret pin or bit of authentication. Further scrutinizing the above code, we will go through each of the two possible cases (for observers restricted to inspecting the variable y:
x == 0: The variable y is modified to 0. The executed branch exerts a direct influence on the value of observed variable. The observer can determine which branch was taken by inspecting the value of the modified variable.
x == 1: Even though y is modified, it’s value remains unchanged. Because there are only two branches, and y is set to a unique value in each one, the observer can still determine the value of of the conditional variable x.

Let’s now change the game slightly, and restrict the observer to inspecting the variable z. There are still two cases to consider:
x == 0: The variable z is left untouched. But, because the observer knows the value was left unchanged, it can still determine which branch was taken.
x == 1: The variable z is modified, and thus directly informs the observer of which branch was taken.

It should be clear now, that if an attacker can inject code like that above, they can use knowledge of branches not taken to infer values of variables used in the branch conditional. In a world of binary branches, observing variables left unmodified can be just as revealing as a direct assignment.

We therefore have to carefully distinguish between information gained via branches taken (accessible by dynamic analysis at runtime) from those not taken (accessible by static analysis of the source code). Denning’s work did not fully distinguish these two cases; so some recent famous work [3] re-used the term ‘indirect’ for implicit flows from branches taken at runtime. However, in seeking clarity, I now prefer the term active flows for implicit flows in branches taken at runtime, and passive flows for implicit flows from branches not taken. (these choices were made in consultation with my postdoc, who first noticed that ‘indirect’ had been accidentally overloaded).

Let’s summarize this terminology in a nice table:

Category	Descriptor	Example	Flow	Analysis
Explicit	Direct	b := a	`a→b`	Dataflow
Explicit	Indirect	b = f(..., a, ...) c = g(..., b, ...)	`a→c`	Dataflow
Implicit	Active	x := true if (x) y := 1 else ...	`x→y`	Control Flow (dynamic)
Implicit	Passive	x := true z := 0 if (x) ... else z := 1	`x→z`	Control Flow (static)

References

[1] @article{denning1976lattice,
    title={A lattice model of secure information flow},
    author={Denning, D.E.},
    journal={Communications of the ACM},
    volume={19},
    number={5},
    pages={236--243},
    year={1976},
    publisher={ACM}
}

[2] @article{denning1977certification,
    title={Certification of programs for secure information flow},
    author={Denning, D.E. and Denning, P.J.},
    journal={Communications of the ACM},
    volume={20},
    number={7},
    pages={504--513},
    year={1977},
    publisher={ACM}
}

[3] @inproceedings{jang2010empirical,
    title={An empirical study of privacy-violating information flows in
    JavaScript web applications},
    author={Jang, D. and Jhala, R. and Lerner, S. and Shacham, H.},
    booktitle={Proceedings of the 17th ACM conference on Computer and
    communications security},
    pages={270--283},
    year={2010},
    organization={ACM}
}

Strong Typing for Security

erich — Fri, 11 Nov 2011 08:13:11 +0000

I got into a mild argument about static vs. dynamic typing. I recognize that static typing can be verbose to the point of being repetitious. Take Java generics for example:

  
List astr = new ArrayList();

There really isn’t a great reason why the compiler can’t infer the type of the variable on the right hand side of the assignment. C# already implements type inference for this case, and C++ is adding it. ML and Haskell are strongly typed and have practiced type-inference since their inception. So we should actually dismiss the verbosity objection to static typing right now, because it’s an artifact of implementation that the more popular languages, C++ and Java, represent really poor examples of what could otherwise be a really good thing.

In my opinion, a static typing system is actually a proof system over your code. We shouldn’t complain about having compiler errors, rather we should rejoice that the compiler is able to automatically detect cases where we were ambiguous or tried to do something ill-defined. We should try to write our code so that the compiler can tell us when we make a mistake. Really, we want to express as many constraints as possible, so that the machine can do more checking and we end up with less buggy code. Statically typing all our variables and expressing systematic constraints is an effort that pays off in spades for large code bases.

But couldn’t we all just use the more flexible dynamic typing languages, and catch the bugs with testing? In my opinion, no. Testing should be done anyway, but it isn’t enough to prove the absence of a bug. Only a proof checker, such as a static typing system, can come close to doing that. I think I can really drive this point home by examining web applications.

The Problem

Web applications are really glorified string processors. HTML requests come in as strings, and web pages are emitted as strings. JavaScript processes more strings in the page layout, potentially requesting even more information from the server in response to user-generated events. Forums, Social Networking, and other participatory applications allow for user generated content. This widespread and popular practice actually leaves our glorified string parser (web app) at risk: for, if we are not careful, a malicious user can supply a string which, if it appears in the ‘wrong’ context, might be interpreted as legitimate JavaScript code by the application. That is, malicious users can execute arbitrary code, with the full rights and privileges as the application itself. This vulnerability is known as Cross-Site Scripting (XSS).

So, we find ourselves writing a string processor which must deal with strings of various encodings, special characters, and escape conventions. Namely, HTML, JavaScript, XML, CSS, URL. If one of these strings (even from our own database) manages to arrive in a context without first going through a filter to sanitize it, then our application has a security vulnerability. Do you think that it’s possible to write test cases (or even auto-generate them) given all the code paths, all the different sources (user, cookie, url, database, etc) and all the contexts in which a string might appear. In my opinion, the exponential complexity makes testing an infeasible approach. What we really need, then, is a proof system to verify that no strings end up in the wrong context.

Static Typing to the Rescue

If we are willing to go back to our application and examine it in detail, we find that we should really be treating each of the above strings as different types. HtmlString should be a different type from JSString, which are again both different from UrlString. Simply expressing each context as a different type enables our static typing system to verify that we never use the wrong kind of string in the wrong context. We can also provide explicit conversion functions, which provide the proper escaping and sanitization when moving from one context to another.

void addToDocument(HtmlString hStr);
HtmlString fromURL(UrlString uStr);
UnsafeString HttpRequest(UrlString uStr);

Language Support

What’s most unfortunate about this approach is that neither C++ nor Java provide us with an easy way to distinguish two strings. We certainly don’t want to use C’s typedef, because that enables automatic coercion between the different kinds of string, which defeats the point. So, we’re forced into creating a separate class for each of these strings, including implementing all the operators that make for convenient string manipulation. I’d really love a language that would allow me to extend my existing string type without fully re-implementing everything, yet still be able to treat the extension as a completely different type.

Conclusion

Essentially we’re using the static typing as a proof system to constrain our programming practices. The static type verification provides a proof that we never use a string in the wrong context. In my opinion, this coding technique is of enormous benefit, and represents a use-case that dynamic typing + unit testing simply cannot approach.

The real trick is recognizing that two strings aren’t necessarily the same type.

Just for reference, I did not come up with this example myself.
Joel Spolsky advocates using Hungarian notation, which I think is too weak for solving security vulnerabilities.
Tom Moertel provides an inplementation of this approach in Haskell.

Documentation for Progress

erich — Sat, 24 Sep 2011 00:12:43 +0000

I’ve noticed in my work recently that documenting my work is one of the most reliable ways of making steady progress. I likely gathered the idea from the internet somewhere, or perhaps from the generous amounts of advice spewed forth from my postdoc. But I do remember, when I was looking up some stuff surrounding the scheme publishing language, skribilo I came across the Nonpareil Project. The author, Jeffrey H. Kingston has kept a nice summary log of all the detailed work that goes into such a project. I’d like to quote a bit, just to give the flavor of the summary:

22 March 2007. The new version of the type system is compiled and tested today. This is about 12 days after I started the revision. The next problem, causing a core dump today, is that range types get frozen but are left unresolved at the end of MatchFunction, so we can’t be sure about coercions and run-time types at that point; we have to defer those until we descend again, during code generation. So there is some re-thinking and rewriting to do there.

23 March 2007. Sorting out the relationship between coercions and subtype tests. This has led to a realization that subtypes have to be tested twice: once when manifesting, without asking for coercions because they are unavailable in the presence of range types, and once during code generation after range types have all been resolved, at which point an error could occur. Also privatised expr_rec further, hiding it from its own subtypes.

28 March 2007. Finally finished working on the relationship between coercions and subtype tests. There were two problems: when the upper constraint is a meet type, and when the lower constraint is a variable. Solutions to both have now been documented, implemented and tested. The next step is to insert subtype calls, and the resulting coercions, during code generation.

As you can see, it’s a pretty high-level summary of the coding concerns for that day. This log runs from 2002 to 2008, and documents the progress of the project. Since, I’ve now moved to WebKit in my Information Flow project, I’ve also started keeping a similar log. Recording each minor milestone of implementation work. I’m expecting this log to help me write my thesis, as it will give me remembrance of details that I might otherwise forget.

I’ve also experienced a secondary benefit from keeping this log. The desire to grow the log, compels me to make progress in my work. I’m motivated to do tasks, just so I can mark them as done in the log. Additionally, since the log is geared toward recording smaller changes, I can break big todo’s up into a series of loggable items. I no longer get flummoxed by thinking of the sum total of work that might go into the implementation. Instead, I’m motivated by smaller tasks, which are easily accomplished.

Finally, the very act of documenting the work, and putting my thoughts into words, has helped to concretize my ideas and concerns. Instead of worrying abstractly about nebulous feelings, I now have mold them into precise explanations. Doing this helps me to realize which aspects are ignorable, and what specific actions I can take to answer the open questions.

Finally, I have a technique that allows me to make steady, targeted progress.

Comparison between Object Capabilities and Information Flow

erich — Wed, 04 May 2011 00:02:18 +0000

Augmenting the Capability Model with Information Flows

I’ve already convinced myself that labels are best implemented as tags on primitive values and references. In JavaScript, it is sometimes useful to view an object as a heterogeneous hash table, mapping field names to data. Having to provide a label for the object itself, rather than a label on each datum, necessarily leads to a conservative labeling. This strategy necessarily leads to a malignant amount of label creep.

Given that the mechanism for labeling primitive values naturally accommodates the tagging of references (since references are implemented as a special case of primitive value) we can easily forgo labeling of objects and the resultant complications. By tagging references rather than the objects themselves, we can more precisely track the path of access to a piece of data and avoid the label creep that results from conservative analysis. Of course, tagging references comes at a cost:

Every reference (and primitive value) increases in size. The exact amount depends on the implementation, but moving to a 128bit internal value is conceivable. (that’s 64bits for a label pointer, 64 bits for the original encoded value)
Labels will have to be unioned, at runtime, at every step along a reference chain. This implies that label union operations will need to be quick. Offhand, I would expect most of these operations to be useless (unioning a sublabel into an existing label).

The most interesting feature about labeling the references comes to light when we compare this form of the labeling model to the object capabilities model. Under the ocaps model, we find some key properties (blatantly stolen from [1]):

Delegation. An object has the authority to message another object, if it has a reference to that other object. In general, this represents an excellent approach to security. I see only one drawback: I, as a system designer, would like to be guaranteed, statically at compile time, that some (untrusted) objects will not be able to obtain, by accident or otherwise, references to certain other (trusted) objects, at run time. I believe that labeled references will be able to patch up this gap in knowledge, by enforcing a run-time check on the use of all references. Messages that untrusted objects make to trusted ones can be detected when they occur, as the trust relationship will be encoded in the reference path actually used to send the message, and monitored, at each step, by the VM. (How does this formally compare to a reference monitor?)
In [2], delegation is addressed by extending the core language with an expression “let(e₁ ≤ e₂) in e₃“. Scoping becomes critical here: the delegation is allowed (explicitly, in code) only for the operations that occur in e₃, the original hierarchy is restored after those operations complete.
Dynamic Subject Creation. Within the ocaps model, each object is a subject, so the granularity with which authority can be delegated is very fine. Conventional approaches to info flow resemble the Access Control List (ACL), in that the principal actors are first specified in a static hierarchy, and usually represent a much coarser model (at the level of users). Addressing this issue for info flow, requires a mechanism for the creation of new subjects. The delegating let mentioned above is not necessarily sufficient, for that form only specifies that e₁ and e₂ both evaluate to principals. An additional mechanism is needed for instantiating a run-time principal. Such flexibility implies that we shall need first-class principals (and further, first-class labels) in our implementation.
The introduction of first-class principals and labels exposes the principal hierarchy to the running program. An extensive analysis should be done on what operations (introduction of new principals, deletion of existing principals, addition of actsFor edges) can be allowed and under what circumstances. The last thing we want from this exposure is to allow the principal hierarchy to become a communication channel in its own right. I believe [2] does such an analysis for its proposed lambda calculus, but I glossed over the technique. Offhand, I would guess that introducing a new principal (using an unforgeable uuid) could be safely done at any time, possessing a principal’s id and satisfying the actsFor check ought to be enough for deletion (caution: what if it is still active elsewhere in the system?, perhaps garbage-collection is the better strategy), and the addition of an actsFor edge is already
discussed in [2].

Presupposing the addition of a language mechanism for instantiating a new principal, I’d like to be able to argue (via direct code demonstration) that the introduction of new principals which possess a strict subset of one’s own responsibilities is enough to fully emulate the ocaps model. Comparing each model reveals an interesting difference in approach: The ocaps model uses membranes and proxy objects to delegate and divide authority amongst a spawn of subjects, while the info flow model uses the linguistic concepts of scope and stack. I think there might be a critical separation when it comes to revocation: the ocap model supports revocation at any time (by toggling a proxy forwarding object). I haven’t seen info flow literature discuss revocation, so I’m not sure how to do revocation via the info flow principal hierarchy. However, it naturally occurs when the above let has finished execution.
Subject Aggregated Authority Management. In both the ocaps and decentralized information flow models, subjects have the ability to edit only their own attributes. This differs from an ACL, where the ability to edit permissions is typically also conflated with the ability to edit the entire ACL. The decentralized nature of the info flow model really helps in defusing problem with unwarranted privilege escalation. We should still beware though, that allowing first-class principals and the ability to add/remove actsFor relations does not accidentally revive the problem.
One aspect of capability leakage should be addressed now: is it possible for a capability to leak from one domain to another via a data channel? That is, if it is possible for the system to pass around, read, write, and execute capabilities, then does it follow that objects can transmit capabilities anywhere they can transmit data? [1] points out that in type-enforced or partitioned ocaps models, this leakage cannot occur. Given that the problem can be solved when cast as a type-enforcement issue, I see no reason why info flow wouldn’t be able to cope. Although the client programmer should be allowed to create new first-class principal objects (representing capabilities), I see no reason why this implies that said objects should be serializable into a data stream. A fully encapsulated unique identification scheme can hide this ability from the programmer, while still maintaining the unforgeablility requirement.
Ambient Authority. The ocaps model requires subjects to present the a form of authentication before exercising their authority. This requirement is phrased in terms of capability possession: Authorization to access is granted via a capability delegation, so the very possession of a capability implies the subjects right to access the corresponding resource (via a method invocation). Within the info flow model, we have a different view of the world: the right to access a resource can be delegated in similar fashion (do we need strong code discipline to enforce this point?), but the subject’s right to access is additionally dependent on the context in which the resource is used. This additional restriction takes the form of a runtime check on the current program counter label, preventing implicit information leakage. In this manner it seems possible to reject access that would otherwise be granted under an ACL model. For example, if a nefarious program ‘guessed’ that it should have read access to a global object, this access is potentially deniable based on the fact that the program counter reveals the access takes place from within untrusted code. (perhaps code discipline isn’t needed after all)
Composability of Authority. Within ocaps, we see that because resources are representable by subjects, there is no separation between the two. As a result, the system is unified under a single abstraction: subjects can be resources, resources can be subjects. Nothing within the info flow model prevents this organization, so it too can benefit from the composibility. JavaScript in particular encourages this organization with its prototype and object model. The only distinction to worry about is the difference between primitives and objects. But by using fat values, we can again achieve the desired uniformity: each primitive has attached a label describing the capabilities (in terms of confidence and integrity labels) invokable on the data. Object references are just another primitive, and will naturally comply with the existing design (the important difference being that references also hold a de-ref capability).
Access-Controlled Delegation Channels. Under the ocaps model, the access to a resource can only be acquired via (1) initial object creation or (2) passed via a communication channel. This means that communication an access right requires a prior link between objects. Under the iflow model, each reference (to object or resource) carries the context (program counter) in which it was manufactured. Passing around these references can lead to further security restrictions, as the security label on the reference might be upgraded to reflect the context under which it was copied. Access to any global data will also have to follow a path of references, with a label union at each step. Although it’s possible to pass a reference through a shared object (use that object as a data channel) the label on the passed reference will encode this activity. So, despite the communication of references through a data channel, the acquired reference might still incur a security violation at the time of use. (Proof needed: this is enough security, we don’t require the access-controlled delegation)
Dynamic Resource Creation. Because the ocaps model has object level granularity, it can dynamically create partitions of existing capabilities. In the info flow model, the ability to create new partitions (and combinations) of existing capabilities would incur changes to the principal hierarchy, and the introduction of a new label into the label lattice. As long as these modifications comply with their own set of security restrictions, the info flow model also allows dynamic resource creation.

It’s clear that JS allows programming within the ocap model, as long as code discipline is enforced (say via a source-to-source translation like caja). However, the objectives of each security model are fundamentally different: ocaps seeks to give the programmer flexibility to enforce somewhat arbitrary security policies. Mainly these are aimed at restricting access to certain objects (such as global objects). While info flow seeks to prevent a very specific attack, information leakage.

Ocaps has a fundamental problem with guarantees of the enforced security. Because the security enforcement is based on object accessibility, it can only be checked by walking the run-time reference graph. The composibility principal above helps to make an argument that the graph can only evolve in secure steps, but this proof is not sufficient, because it is easy to forget about a corner-case syntactic form that bypasses the composibility. Info flow can mitigate this problem, because it will offer a guarantee that no resource is used in a disallowed manner. So even if access to information is both obtainable and manipulable it will not be able to leave the system because of a runtime check on external (network) communications. The only channels that would be difficult to protect would be those in native objects, everything JS-only is kept safe.

[1] Capability Myths Demolished
[2] Run-time Principals in Information-flow Type Systems

Approaches to JavaScript Security

erich — Thu, 28 Apr 2011 22:45:16 +0000

This is, as best as I can give right now, an exhaustive enumeration of all the different approaches to JavaScript security.

Source Translation.
Does a source-to-source translation of JS into a secure subset. The technique is used to jail an included javascript, passing to it only those references to the outside world that it absolutely needs, and preventing it from following reference chains to an outside environment.
Pros: flexible, follows the object capabilities model, references passed in can implement security monitoring.
Cons: requires parsing the included script, disallows some JS forms (eval), not much can be done if the included script was written in a way that required passing in more authority than is needed.
Used by: Caja, Jacaranda, FBJS, ADSafe
Security Typing.
Since non-interference can be proved within a sufficiently powerful (and convenient) typing system, we naturally reach for a language solution.
Pros: the type system can prove the code obeys non-interference
Cons: crufty syntax introduced for label types, label types difficult to model (polymorphic labels, subtyping, tracking the program counter label, etc), programmer must simultaneously satisify two type systems (admissible program lies in the intersection of orthogonal type systems), requires static type system
Used by: JiF
Bytecode Instrumentation.
Addition of security bytecodes, and necessary modifications to an existing VM so that it ensures non-interference dynamically.
Pros: works for dynamic languages, fails only on code that actually tries to leak (rather than rejecting code that might leak)
Cons: difficult to implement, runtime slowdown
Used by: JSFlow (my project)
Bytecode Typing.
Perform security typing on the bytecode instead of on the source.
Pros: a verifier can refuse code that might leak when that code is loaded (could support eval, by staging the non-interference proofs), this verifier could itself be proved (if implemented in something like Coq)
Cons: bytecode has less knowledge than the source text (although, with JavaScript the parser could annotate the bytecode with security-related hints, or techniques for typing SSA could also be used), intrepeter produced by Coq would almost certainly underperform.
Used by: ???
Bytecode Translation.
Similar to the secure source subset, but now we parse JavaScript into a bytecode that’s secure by construction (then execute it on a secure VM).
Pros: The bytecode/VM can be based on a proved core language (such as Flanagan and Austin’s lambda-info)
Cons: No such VM exists (could be a research project unto itself), would have to implement a JS source-to-secure bytecode translator, and interfacing the secure VM to a browser would be very onerous.
Used by: ???

In general, I think that the approach we currently use gives the best trade-off in terms of implementation detail, client-visible changes, and implementation effort. However, because I happen to have acquired paranoia (one of the side-effects of working with the security conscious), I find that I cannot completely trust that our implementation behaves as desired. In particular, I wish to have a formal proof that we handle all cases appropriately. I see two major obstacles: (1) Giving a full and complete specification of JS is tedious and itself error-prone, but has been done by hand, and (2) having a proof on one hand, and an implementation on the other does not guarantee that the implementation meets the proof, while an auto-generated implementation from the proof is a no-go because of both performance and integration.

Security Typing for JavaScript

erich — Wed, 27 Apr 2011 01:27:26 +0000

Devil in the details.

I’d like to repeat an example (given my Mark Miller in his work on E), of two different ways to copy a file, and the security implications of each. First,

shell$ cp foo.txt bar.txt

This command invokes a copy program that will:

recognize foo.txt as a filename.
recognize bar.txt as another filename.
open the file indicated by (1) and read its bytes.
open the file indicated by (2) and write bytes into it.

Clearly, the cp command has been delegated quite a bit of power. Much of this authority comes from the interaction with the file system: it needs to be granted access to read from the first file, and it must be granted access to create and write to the second file.

But, we can accomplish the same task by another phrasing:

shell$ cat < foo.txt > bar.txt

In contrast to the above, cat needs no more than access to only the 2 file handles that the shell grants (and this is only stdin and stdout). It does not need to interact with the file system in any other fashion.

This example proves itself enlightening, when we recognize that within any programming language many phrasings can accomplish the same task. The programmers choice of phrasing then becomes a critical aspect of any subsequent analysis. In the above example, even though both programs have the same effect, when we assess the security implications we find very different outcomes. In the case of cp we worry greatly over the potential abuse of authority.

Reasoning about programs.

Even though the above example concerned itself with a trivial task on the command line, we find the same results when we turn our eyes toward full programs. In order to feel comfortable about running a program we’d like to be able to analyze it first, to be sure that it doesn’t “go wrong”. In particular, my research work focuses on enforcing non-interference. So, “going wrong” means that secret information is potentially inferable from publicly observable information.

With that aim in ming, many of the information flow papers that I read treat non-interference as a lemma within a typing system. In order to understand these papers I started reading Pierce’s “Types and Programming Languages”.

In computer science, we use type systems to construct proof regarding our programs. In essence, we want to identify and reject programs that might “go wrong”. Imposing types onto the data allows automated reasoning about a program’s behavior. However, this ability comes at a cost: well-typed programs are a strict subset of all programs. For example, the program:

if (true) then 5 else false;

would be rejected by a strongly-typed system, such as ML, because the result of the if-statement should yield the same type in both branches. Languages, such as JavaScript, that have much more loosely typed systems usually allow such ill-typed programs, and deal with any evaluation errors at runtime.

If you want to be sure of some property about a program prior to running it, you must be able to perform a static type analysis, that proves the property you are interested in. Fortunately for my research, the non-interference proof can be encoded into a type system. Then, any well-typed program will never “go wrong” in the sense that I gave above.

Unfortunately, I research dynamic languages, in which constructs such as eval, and several mechanisms of polymorphism, prevent a precise static type check. (To preserve soundness, the analysis will have to be conservative, resulting in the rejection of almost all real-world programs.) In the next post, I shall examine to what extent we might still be able to leverage strong typing (automated proving) to enforce security in dynamic languages.

The Wrapper Conundrum

erich — Thu, 21 Apr 2011 23:00:36 +0000

In my information flow research, we have the objective of attaching a security label to every object/value within the running system of a JavaScript VM. Two approaches are immediately evident:

Fat Values. We can extend the native encoding of values to include a pointer to the label attached to that value. In JS, this means that we’d have to have, at minimum, a 64-bit representation. Both Spidermonkey and Webkit currently use a 64-bit representation, so we’d have to modify the encoding to account for the label pointer. Performing this modification on such a low-level aspect of a VM is not a trivial undertaking, as it will affect many places.
Cloaks (aka. labeled wrappers/proxies). We can also implement the labeling, by wrapping each object/value inside another object, whose sole purpose is to carry the label. At first glance, we would expect fewer modifications to be made to the underlying VM, as this only requires the introduction of a new wrapper/proxy class. The difficulty lies in making the wrapper as transparent as possible, it should not be evident (to the JS programmer) that all values are suddenly wrapped up inside labeling objects.

Let’s throw the first curveball that can let us distinguish between these two choices: primitives. In basically all implementations of JavaScript, primitive values are encoded (via tag bits) into a plain old data type. Evidence of this implementation detail occasionally leaks into view of the JavaScript programmer; Early 32bit implementations would auto-convert any integer which could not be represented in 31bits to a double. From an implementers perspective, having primitives makes operations on common types fast, at the cost of introducing some extra special-case logic for each primitive type. Unfortunately, this logic disperses itself across the entire VM.

To give real support to information flow, we will have to come up with a way of tagging both primitives and objects. Primitives represent a challenge because all of their bits are already in use, there simply isn’t any room to add a label pointer to the data payload. It is possible to create such room by changing the way in which primitives are encoded (as introduced in the fat value approach), but comes at a high cost, because it potentially changes the logic all over the system. Still, there remains a distinct advantage: all datatypes have their label directly attached to their representation (encoded into the payload). Taking this approach it would, with a large amount of engineering effort, be possible to give a labeled, 32bit implementation of JavaScript using a 64bit internal datatype.

Alternatively, we could avoid the cost of changing the data representation at the lowest level, and instead wrap each primitive with an object. Objects can be modified to hold a label in addition to all the other special fields (parent, prototype, etc). This field naturally labels the object, and its contents. To label a primitive, we simply stuff it, and the corresponding label, into a special wrapper class. Clearly, this will make some operations slower, because there will be a layer of indirection when the primitive value is unpacked prior to use.

During implementation, we discovered yet more difficulties that arise when cloaking primitives. There are many places within the VM that expect and require primitive values (such as the length field of an Array). Furthermore, when a wrapped class leaves the VM (other code is allowed to use JavaScript as a client) that code can become confused if the resulting class, a cloak, doesn’t match what was expected. That is, external code has many assumptions that it will get back a specific type, and cloaking breaks those assumptions.

In order to have success with the cloaking approach, we have to be able to introduce a new class into the VM, and keep it transparent at both the JavaScript and the VM levels. That is, we don’t want any JavaScript code to become aware that a primitive has been wrapped with an object (for example, attempting to set properties should be ignored) nor do we want the VM to become too aware, because that will require special case-logic it too many places. I’ve concluded that solving both of these constraints is nigh impossible. For example, let’s take the result of the typeof operator. In order to maintain transparency at the JS level, the cloak will have to lie about its type. A cloaked integer should return “number” and not “object”. However, the cloak can’t simply return the result of a dispatched call to typeof in every case, because that interferes with many places inside the VM that a switch-case decides what to do based on the type of an internal value. A jsvalue that encodes an object reference, yet returns “number” when type inspected confuses the logic of the VM.

There is also a strong semantic difference between the two approaches. When using fat values, the label is attached to data when the value is a primitive, and object references when the value is an object (or interned double). When using cloaks the label is attached only to objects themselves. I haven’t fully explored the difference between having a labeled reference vs having a labeled object, but I think the difference is analogous to having an Access Control Listing by columns vs Object Capabilities by rows, as discussed in Capability Myths Demolished.

At this point I am in favor of the fat value approach, because I’m liking the reference semantics, and the transparency with which primitives can be labeled. I’m also willing to accept the cost of having fatter values.

Now for the second curveball: How would you label interned objects? This primarily comes out of Java, but JavaScript does the same thing. With Java, interned objects have an abstraction leak at the == operator. For example, in Java:

Integer x = 5;
Integer y = 5;
Integer a = 10000;
Integer b = 10000;
System.out.println("Integer(5) == Integer(5)? " + (x==y));
System.out.println("Integer(10000) == Integer(10000)? " + (a == b));

---------
Integer(5) == Integer(5)? true
Integer(10000) == Integer(10000)? false

What amounts to an optimization, actually hurts our attempts at security. How should we label what are conceptually two different objects, based on their use in the code, if the VM wishes to intern them at the same location? I think it’s pretty clear, that having the objects contain their own labels will run us into trouble. Of course, it’s possible to wrap interned objects with a cloak, but I think it would be difficult to decide that an object (as opposed to a primitive) should be wrapped. Essentially, such a wrapper would amount to a labeled reference: exactly what the fat value approach already provides.

In my reading of information flow security, so far, I’ve yet to see anyone that is discussing these kinds of implementation details. Many papers invent some ideal (tiny) language, then performs a proof of non-interference for that language. Without a means for translating real-world languages (with prototype chains, dynamic field lookup, generators, co-routines, exceptions, continuations, etc.) into that ideal model, we haven’t made progress. Other papers tackle a subset of the real language, and claim that the approach extends to the full language; after trying to implement this stuff myself, I seriously doubt this claim. Many language features just don’t play well together, and can seriously upset some of the hidden assumptions that allow a proof of safety for the subset to be constructed.