Author: imccoy

3 Approaches

In today’s fantabulous outing, marvel at three approaches to efficient implementation! Ride the wave of perplexity that is the function from a list of inputs to the current state! Gaze in wonder at the persistent storage of that current state!

Option 1 – incrementalisation

In this approach, we take a state function f, that takes a list of inputs and produces current state, and produce an incrementalised state function, that takes a change in the list of inputs to the resulting change in the current state. Liu gives a systematic approach for deriving the incrementalised versions of functions.

Advantages

It enables a neat way of making a highly responsive UI. As well as incrementalising the state function, you also incrementalise the view function (a function from state to HTML). On the client, you can run the incrementalised state function, pass its output to the incrementalised view function, and apply the resulting change to the DOM. (this works)
It lends itself to a really cute way of integrating with a data store: write a function that takes the system state and returns a relational representation of that state. Then incrementalise that function. Any time the system state changes, you can pass that change into the incrementalised relational representation function and hit the database with its output. Shiny! (I haven’t done this yet)

Disadvantages

The wording of Liu’s paper is very careful. What they have is a systematic approach for deriving incrementalised versions of functions. What they don’t have is a fully-automated approach. There needs to be some user interaction or some hinting (in the general case) in order to produce an incrementalised version of a function.
When you have an inner function that uses free variables, and the values of those free variables can be affected by a change, the incrementalised versions start to get pretty hairy and potentially rather inefficient. The current version of my code assumes that any free variables used will not change, which is obviously not true in general.
At the moment I have an informally specified bug ridden implementation of less than half of Liu’s technique.

Option 2 – Partial Evaluation

Partial evaluation is where you take a program of n inputs, then provide some subset of those inputs and evaluate the program with respect to those inputs. This isn’t immediately applicable to this state-updating shenanigans, because there’s no provision there for changing inputs.

Instead of running the partial evaluator directly on a program that takes a list of inputs and returns a state, we can augment the program with a new top level function:

return state(cons(new_input, inputs))

Assuming the partial evaluator unfolds that cons, we’re in with a chance of getting a program that incorporates a new input without doing too much stuffing around. It’s not at all clear how you might get the state into a persistent data store with this approach, but worse then that, the resulting program won’t know how to capitalize on existing bits of state. This is probably surmountable by memoizing computation of bits of the state, but if you’re going to go there…

Option 3 – Adaptive Computation

Adaptive computation (after Acar) is where you think of a program as building up a dependency graph of values. The final output depends on a bunch of intermediate values, which depend on other intermediate values, which ultimately depend on the program’s inputs.

There’s no obvious barrier to storing that whole dependency graph in some sort of persistent store. Obviously, this is going to be a much weirder database than Option 1 gives you; but on the other hand, this approach naturally gives you much more fine-grained change propagation behaviour. That means you don’t have to worry about free variables making stuff confusing. More importantly, it is fully automatable. You just feed your program into the madhouse, and off it goes.

Similar AJAXyness to that of Option 1 is imaginable, but vastly more intricate: you must ship the portions of the dependency graph relevant to the client’s computation down to it. Determining in advance which portions will be relevant is quite likely to be vulnerable to the halting problem, so you’ll have to either send too little (and have a lazy loading strategy to get other data that you need on the client) or too much (and clog the intertubes, OMG).

This option is closest to the approach taken by Asana’s Luna and by Meteor.

Parenthetically: I was amused to discover that with Option 1, the efficient thing to do is to cons a new input onto the head of the list. With this option, the efficient thing to do is to replace the empty list at the end of the input list with a singleton list containing the new input.

Conclusion

I’m sticking with Option 1 for now. I’m about to attempt integrating with a database; if that goes badly, Option 3 is going to start looking much better. Again.

Not In Detention

They’re not in detention, honest! It’s right there in the law (clause 11):

An offshore entry person who is being dealt with under subsection (3) is taken not to be in immigration detention (as defined in subsection 5(1)).

So that’s nice. They’re not in detention! What are they? Well, that’s the question, innit?

They are simply enjoying the hospitality of the Australian government on some island a long way from anywhere, perhaps with no way to leave unless they tell the aforementioned Australian government to send them home.

I say “perhaps”, because there’s no information about what the conditions in these processing places will be. I was looking at the law because I wanted to know what we were doing to protect these people who come to us and ask: please, protect us? I couldn’t find anything, and was hesitant to write about it because I’m not really qualified to read these things. But then I found this. Take us to the bridge, Sarah Hansen-Young:

The Houston report included many important recommendations, none of which this government or the opposition have said any word of since Monday. Where is the commitment to increase our refugee intake immediately? Where is the plan to resettle those who have been waiting for years in Indonesia? Where in this legislation is our commitment to upholding our commitment under the convention to treat people properly, to ensure that they are housed appropriately, to ensure that they have access to legal assistance and to ensure that we do not detain children? None of that stuff, none of those important safeguards which are listed in the Houston report, are in this legislation. The government say that they are implementing the Houston report and they have not even bothered to read the detail of the recommendations.

Bear in mind that I’m not just believing her, I have read the documents. She speaks truth. The legislation enables the first and worst recommendation, offshore deten^h^h^h^h^hprocessing, and government and parliament have done nothing to ensure conditions in those places are tolerable. Nor have they done anything with the report’s suggestions to improve the queue that asylum seekers are supposedly jumping.

Nothing about Houston Report 3.46’s appropriate accommodation, or its physical and mental health services. Nothing about its educational and vocational training, its monitoring arrangements and case management assistance. And most worryingly, nothing about its application assistance or its appeal mechanism.

Nothing about the Report’s 3.22’s increase in places for folks currently in Indonesia, or its “practical agenda of initiatives”, or 3.26’s capacity-building of UNHCR processes in the region.

Nothing.

Please, please, prove me wrong.

But enough of the heavy stuff! Let’s have some comic relief. From Migration Legislation Amendment (Regional Processing and Other Measures) Bill 2012, aka the Stop The Boats bill:

198AC Documents to be laid before Parliament

(1) This section applies if the Minister designates a country to be a regional processing country under subsection 198AB(1).

(2) The Minister must cause to be laid before each House of the Parliament:

(a) a copy of the designation; and

(b) a statement of the Minister’s reasons for thinking it is in the national interest to designate the country to be a regional processing country, referring in particular to any assurances of a kind referred to in paragraph 198AB(3)(a) that have been given by the country; and

(c) a copy of any written agreement between Australia and the country relating to the taking of persons to the country; and

(d) a statement about the Minister’s consultations with the Office of the United Nations High Commissioner for Refugees in relation to the designation, including the nature of those consultations; and

(e) a summary of any advice received from that Office in relation to the designation; and

(f) a statement about any arrangements that are in place, or are to be put in place, in the country for the treatment of persons taken to the country.

In other words, there’s a whole lot of stuff that will help the parliament to make the judgement about if it’s a good idea to send refugees to this country. The minister must provide it. Take particular note of that word “Must”, the third word in (2). Onwards:

(3) The Minister must comply with subsection (2) within 2 sitting days of each House of the Parliament after the day on which the designation is made.

What? What are you talking about? 198AB said that if the document wasn’t explicitly rejected by either house within five days, it would be accepted. I’m not clear on how “within 2 sitting days” is counted by these folks, but if there’s scope for the minister to provide the documents at the end of day 2 out of 5, that leaves 3 days for the houses to accept or reject. They’ve built a stalling tactic right in to the legislation! But that’s okay, surely it can’t get any worse…

(4) The sole purpose of laying the documents referred to in subsection (2) before the Parliament is to inform the Parliament of the matters referred to in the documents and nothing in the documents affects the validity of the designation. Similarly, the fact that some or all of those documents do not exist does not affect the validity of the designation.

The first sentence says it’s to help you decide about the designation, it doesn’t define or otherwise affect the designation (in network protocol standards, we’d say it’s not normative). Fair enough. But the second sentence seems to come awfully close to giving the Minister a pass on justifying the designation of a country. I thought it couldn’t get any worse? Surely now…

(5) A failure to comply with this section does not affect the validity of the designation.

Ah, this one is simple. The word ‘must’ in clause (2) was for cosmetic purposes only. The Minister can not only stall (as per (3)) or partially comply (as explicitly provided for by (4)), he can in fact skip the whole bit.

See? Comic relief, just like I promised. It must be funny, because the primal law of “if you don’t laugh, you’ll cry” is in effect. It’s in effect so hard. Still, at least it’s better than the Morrison amendment.

P.S. Big ups to the folks maintaining the Australian Parliament House website, by the way, you guys rock my world. Austlii’s consolidated legislation listings rate a mention here too, it’s good stuff. If you think the law is good stuff, that is.

P.P.S There is a more charitable reading of 198AC. The pre-amendment rules are that the minister gets to designate places of off-shore processing; in the worst case, the new rules degrade back to that situation.

The Asylum Seeker Trade-off

I wrote earlier that the trade-off between deaths at sea and possible trauma in detention at Nauru could potentially be justified. That’s predicated on a pretty big assumption, the assumption that there are only two options to choose between. Rather than a policy that may encourage people to come by borderline unseaworthy boat, or a policy of deterrence, how about a policy where safe transport from Indonesia to Australia is provided by the Australian government? This policy is often mentioned, but usually with scorn: “what are we supposed to do, just charter a plane to bring them here?” I am interested in actual reasons not to do this.

The Houston Report doesn’t really answer my question, although it provides a hint. Buried in the document, deep within the sweaty groin of Attachment 6: “Australia’s International and Regional Engagement on Irregular Movement and International Protection”, is this note about practical arrangements developed under the Regional Cooperation Framework of the Bali Process on People Smuggling (etc etc): “Any arrangements should avoid creating pull factors to, or within, the region.” Creating a safe and effective mechanism for people who would otherwise be Irregular Migrant Arrivals would create a big-ol’ pull factor to Indonesia, and Indonesia doesn’t want that.

That’s all reading between the lines. The report doesn’t really talk about the possibility. But – should my reading be accurate – Indonesia’s distaste for the notion would probably not be absolute. Could they be persuaded?

Limits

The Houston Report advises re-opening Nauru and Manus Island as off-shore processing centres for asylum seekers. From my point of view, that sucks. I try to be all love-your-neighbour, yo, and shipping your neighbour off to a tiny island – where they are to stay for an unknown period of time – is not real loving. How did such a sucky thing come to pass?

There were three people on the panel that produced the report: Angus Houston, a retired air force man; Professor Michael L’Estrange, an academic and public servant with a rightish bent; and Paris Aristotle, a man with the most awesome name ever but also the Director of the Victorian Foundation for Survivors of Torture. Mr Aristotle’s schtick is supporting and advocating for refugees.

When asked about the possibility of trauma in the new Nauru processing centre, Mr Aristotle compares the potential trauma of prolonged presence on Nauru to that of losing relatives at sea. Given that the rate of boats being lost at sea en route to Christmas Island has increased recently, this issue of asylum seekers dying at sea is a real concern, and the trade-off is potentially justified.

Alas, alack, we can explain all this without considering the trade-off. The panel was assembled to answer a question, and that question takes the form of their Terms of Reference for the report. The question they were asked, Numero Uno, their raison d’être: “how best to prevent asylum seekers risking their lives by travelling to Australia by boat”. Or, as Tony Abbott would put it: “Stop the Boats”. There were other things they were asked to write about:

source, transit and destination country aspects of irregular migration
relevant international obligations
the development of an inter-related set of proposals in support of asylum seeker issues, given Australia’s right to maintain it’s borders
short, medium and long term approaches to assist in the development of an effective and sustainable approach to asylum seekers
the legislative requirements for implementation
the order of magnitude of costs of such policy options

None of these things are the physical or mental well-being of asylum seekers. None of these things are about respecting and caring for vulnerable people looking to us in a time of need. These questions were not asked.

They are kind of between the lines. Our “relevant international obligations” include some treaties that bind us to a certain minimal level of not being a dick. Imposing irreparable psychological damage probably isn’t a sustainable approach to asylum seekers. The well-being of asylum seekers might be considered an “asylum seeker issue” for proposals to be in support of. But this is all implicit stuff, it doesn’t rate a mention in the terms of inquiry.

Compare and contrast this with the lead entry, the Stop the Boats priority. This requirement can be read out of the “inter-related set of proposals” requirement, the “effective … approach to asylum seekers” terms, and maybe even the “destination country aspects”. But to leave it hidden away in there would be to allow the report to come up with any answer at all. The answer that might perhaps buy Tony Abbott’s silence on the issue is off-shore processing, and so the question must focus on deterrence. And thus we asked the panel not to advise us on how to humanely and effectively treat asylum seekers in Australia, but on how to stop them coming.

Paris Aristotle never had a choice. He can be as hard-headed as he wants; he is bound by the terms of reference to answer the wrong question. He is bound to give an answer focused on stopping the boats. He may also think it a justified trade-off in the present circumstances, I don’t know. But it doesn’t matter. His job was to answer the question he was asked. He could have affected the details, but the overall thrust of the report? He never had a choice.

Well, that was fun

It appears that if

you’re using Android’s HttpURLConnection (2.3 is the specific source of the grievance), and you’re talking to certain web servers (ahem, in this case nginx with varnish in front of it, if the headers are to be believed).
you call setDoOutput(true)
you call setChunkedStreamingMode()

then calling getOutputStream() will cause your later call to getResponseCode() to return a 503 HTTP_UNAVAILABLE.

Isn’t that nice? My fix was to precompute the POST data so I could call setFixedLengthStreamingMode() with its length instead.

PS. You know how I said it was fun? I lied.

Do The Shuffle

My grand purely-functional web application architecture project continues, and today I’d like to talk about a dictionary webapp. It’s very simple, having only two types of inputs, one of which creates words and one of which creates definitions.

type Word = String
type Definition = String
data Input = NewWord Word | NewDefinition Word Definition

The system state is a list of words, where each word is accompanied by a list of definitions (using some unsurprising helper functions, defined at the bottom of this post):

state :: [Input] -> [(Word, [Definition])]
state inputs = map word (filter isNewWord inputs)
  where word w = (w, definitions w)
        definitions w = definitionsFor w inputs
definitionsFor w = (map definitonFrom) .
                     (filter $ (==w) . wordFrom) .
                     (filter isNewDefinition)

That is, for each word-creating input, make a word entry. Gather all of the relevant definition entries into that word entry.

And from this I want to produce an implementation that does the minimum necessary to update the state when the inputs change. For an implementation to be practical, a NewDefinition input for a word should only impact on values that are directly related to that word. From that perspective, our state function is trouble. In particular, that list of (Word, [Definition]) is trouble.

Here’s why: The simplest thing that I can think of to produce a state-updating version of the function is to use a nice simple top-down algorithm. So from a regular map function,

map f [] = []
map f (x:xs) = (f x):(map f xs)

We produce something like this (where f’ is a state-updating version of f):

map' f' [] = []
map' f' (x:xs) = (f' x):(map' f' xs)

This implementation has an unpleasant implication. For any (f x) to change, every value in the initial x:xs has to be looked at. This is bad news from a practical-implementation point of view, because it means that adding a definition for word w actually has to touch the list elements for all of the words created prior to (or after, which one it is doesn’t matter all that much) w. It is quite possible to make an implementation like this that actually works [1], if you are not worried about the inevitable failure to scale.

At the same time, this bad news is no news at all, since immutable-everything value semantics demand that changing a value within a list also changes all of the lists that contain that value. That is, if we have a list (1:(2:(3:[]))), an update to 3 can only be made by making an update to the cons cell with value 2, and that update in turn can only be made by making an update to the cons cell with value 1. Those are the rules of the game.

Changing the Game

Why do implementations of the map-reduce framework not have this problem? After all, they are all about having a huge pile of input and generating some number – possibly a similarly huge pile – of outputs. How do the inputs find their way into the relevant outputs? When we say map-reduce, we are omitting the middle step, that of shuffling. The result of the computation is not `reduce (map input)’ but rather `reduce (shuffle (map input))’. The Map stage produces a pile of (key, value) pairs, the shuffle stage arranges those by key so you have a pile of (key, [value]) pairs, then each of those (key, [value]) pairs is reduced. The shuffle stage is all about sending values off to where they will have an effect. So inspired, we can introduce a function shuffle:

shuffle :: (Ord k) -> [a] -> (a -> k) -> ([a] -> v) -> Map k v
-- the type signature kind of says it all for me, but 
-- here's an implementation anyway
shuffle is fk fv = foldr (\i m -> addAtKey i (fk i) m) Map.empty is
  where addAtKey i = Map.alter (Just . maybe [i] (i:))

That is, for some collection of things `[a]’ we have a function that says which bucket they should go in, and a function that says how to produce a value from the members of a bucket, and from all those things we produce a map of buckets to values. This idea is so simple and so obvious that it took me a month to think of it.

The point of all that is of course that we can implement shuffle’:

shuffle' (\is -> i:is) fk fv fv' = Map.alter f (fk i) v
  where f Nothing = Just $ fv [i]
        f (Just v) = Just $ fv' (\is -> i:is) v

So when we have an as-yet empty bucket (the `f Nothing’) case we just compute fv on the single element. When we have a bucket with something in it, we compute the effect of the bucket input’s change on the bucket’s output, apply that change to the bucket’s output, and just like that your uncle’s name is Robert. Probably. This is implementable too, and we can rework our dictionary app to use it quite readily:

definitions inputs = Map.mapWithKey definitionsFromInputsFor
                                    (shuffle wordFrom inputs)
  where definitionsFromInputsFor w inputs
            = map definitionFrom (definitionInputsFor w inputs)

We have lost something on the way, though. At the start of this post, we could write any function we liked to express our program’s state. Now, the set of functions we can write to do that is severely curtailed. On the other hand, they are efficiently implementable [2]. Will that severe curtailment turn out to be a problem?

It raises another question. Is `shuffle’ actually giving us much, or is the win coming from some special privileges being given to the Map data structure? Maybe representing system state as a list is just a stupid thing to do (exhibit A, the vastly simpler shuffleized implementation).

On both of these questions, further work (as the academics like to say) is required.

Oh, right. Here’s those unsurprising helpers I was talking about. They’re, like, totes boredom:

isNewWord (NewWord _) = True
isNewWord _ = False
isNewDefinition (NewDefinition _ _) = True
isNewDefinition _ = False
wordFrom (NewWord w) = w
wordFrom (NewDefinition w _) = w
definitionFrom (NewDefinition _ d) = d
definitionFrom _ = undefined

[1] Beware, the README is something of an unreliable narrator.
[2] Unverified claim.[3]
[3] STFU.

Android and Predexed Clojure

This is a continuation of Android and clojure, 2 apks and a proxy.

I finally managed to get acceptable build times by pre-dexing the clojure jar. You can see the good sauce over at github, in DexterActivity.java. It turns out there are 3 goats to sacrifice.

Firstly, the DexClassLoader takes a filesystem path, but the pre-dexed jar is a resource in the APK. The pathForApk method copies the apk out of the resource and into a cache directory that the DexClassLoader can get at.

Next, you need to make sure that Clojure uses the right class loader. The way I made this happen is by building the Clojure runtime into one apk, my clojure code into another apk, then using a single DexClassLoader for both apks. Then, tell the Clojure runtime to use that DexClassLoader. That juju starts at line 36.

Finally, you need to use reflection to load up the clojure classes. Line 43 is the responsible party.

And it all works. This drags down the build time to within reasonable limits. Alas, the startup time is still too expensive. If I tilt again at this windmill, it will be to break up the clojure jar into a bunch of smaller APKs, and write a class loader that loads those smaller APKs lazily.

Wine Theorem

We define a glass of wine W as containing sips w₀, w₁, w₂, …, w_n. For two sips w_n and w_m, n < m if the drinker consumed w_n earlier than w_m. Note that is <, strictly less than, because if n = m the same sip is being consumed twice. While not unknown, this phenomenon usually does not appear until after more than one glass, and in any case is irrelevant to this study.

We further define a function f, which denotes the flavour of the wine. The better the wine tastes, the larger the value of f. Negative values are regrettably possible.

The Wine Theorem

The Wine Theorem is simply: f(w_n) > f(w₀)

The proof is left as an exercise to the reader[1].

Further Conjecture

Conjecture: as n approaches infinity, f(w₀) ÷ f(w_n) approaches zero.

Attempts to prove this conjecture are not encouraged.

Conclusion

The last sip of the glass tastes an awful lot better than the first.

[1] the proof is notably simpler when attempted with a glass of cheap wine in hand.

Testing is pooched

What’s the point of an automated test? I humbly submit that it is to prove that your program does what it’s supposed to.

The things we want to say are incredibly simple. When some data goes in here, it comes back out over there. When it comes out over there, it’s been processed in a particular way.

And yet, in our practice we say neither of these things. We declare things about particular methods[1], and hope that the aggregate effect of the methods is correct. We declare that taking particular actions elicits a particular reaction, and then we have an infinitude of minor variations that mostly take the same paths over and over again[2].

So where is the ambition? Why do we not say what we mean? This data goes from hither to thither; and on its way, it gets processed in such-and-such a way. One might object: the data goes not merely from A to B; it comes from A, then goes to B, C, and D. To which I say, Then we can say that! And Two might object: the data gets processed in one way if it’s going to B, another way if it’s going to C; but there’s this other processing that happens to it, no matter where it’s going. To which I reply, That, too, we can say.

I wish we said these things.

[1] rspec

[2] cucumber

Stopping at the border

For the last few years I’ve been building rails apps for a living. On some level, they look like this:

A request comes in, and gets picked up by some request handler. A bunch of mutation happens in memory, then a bit of mutation in the database, then some more mutation in memory, then a response gets made. Any of those mutations can potentially interact with almost anything in any other handler. You can reduce the risk of interaction by following various disciplines, but most people don’t, and so I’ve spent too many hours debugging IO happening at surprising times, or data changing in surprising places, or things appearing with the wrong type.

An immutable-everything, pure, strongly-typed language like Haskell offers a potent guarantee. Between those three properties, we are assured: no surprises. Feeling the bloody wounds of that debugging time, I would love that no-surprises guarantee to apply during office hours. Which means I want to get that guarantee in web-land, and so I’ve spent time with Yesod, Ur/Web, Opa; enough time, I hope, to grok the philosophies thereof. I’ve spent time reading about lift and noir and django. I’ve read of the many-flowered garden of academic approaches to web programming and the whole menagerie of java frameworks for same.

Those first three go really hard on the no-surprises properties. They provide a lovely little strongly typed and pure and immutable-everything bubble for you to send a response out and twiddle the data store when a request comes in. The guarantee makes me happy, but the experience leaves me a little confused. We’ve eliminated surprising interactions between mutations within different handlers. Interactions between a handler’s mutators are limited by the language’s immutable-everything style. But all the handlers are still interacting with the database in a wild mutation frenzy, and there is a strange and lurking sense that the process is not that different to that of a rails app. A request comes in, so we change stuff in the data store, and now we have all the potential surprises of mutability. Wha?

The problem is that in choosing to make the request handler strongly typed and pure and immutable-everything, we choose to allow those guarantees to stop at the borders of the request handler. Can we model the application in such a way as to extend these boundaries right out to the border of the application?

Here’s a statement that is always true: the state of the application is a function of the inputs it has received. Historically, that state is maintained by continuously making small incremental updates, but that is in some sense an optimisation. We can define the state as a function of the inputs, and then recompute the state every time we need it. Or, we can use techniques from FRP and incremental computing to keep the state current as the inputs list changes.

If it’s a short-lived application, then you could use incremental computing to define the state function in memory on the inputs, and let that machinery keep the state up to date as you add inputs. It should be possible to implement the same machinery[1] so that it works over data that lives somewhere more permanent, too. Going even further may be an option: we can write a compiler that takes a function from inputs to state, and produces a function that tells you how to update the state for any given change to the input.

If we can do that, then we can write a pure function over the inputs, and get an efficient program. Then, the next time the question presents itself, “how the hell did that happen?”, the possibility space to be explored will be that much smaller. Prevention of bugs, rather than cure.

[1] for each value, you keep track of values that depend on it and may have to change when it does. When a value changes, you recompute the dependent values and see if they change. Recurse if necessary.