February 2012 – Fine Shambles

What’s the point of an automated test? I humbly submit that it is to prove that your program does what it’s supposed to.

The things we want to say are incredibly simple. When some data goes in here, it comes back out over there. When it comes out over there, it’s been processed in a particular way.

And yet, in our practice we say neither of these things. We declare things about particular methods[1], and hope that the aggregate effect of the methods is correct. We declare that taking particular actions elicits a particular reaction, and then we have an infinitude of minor variations that mostly take the same paths over and over again[2].

So where is the ambition? Why do we not say what we mean? This data goes from hither to thither; and on its way, it gets processed in such-and-such a way. One might object: the data goes not merely from A to B; it comes from A, then goes to B, C, and D. To which I say, Then we can say that! And Two might object: the data gets processed in one way if it’s going to B, another way if it’s going to C; but there’s this other processing that happens to it, no matter where it’s going. To which I reply, That, too, we can say.

I wish we said these things.

[1] rspec

[2] cucumber

For the last few years I’ve been building rails apps for a living. On some level, they look like this:

A request comes in, and gets picked up by some request handler. A bunch of mutation happens in memory, then a bit of mutation in the database, then some more mutation in memory, then a response gets made. Any of those mutations can potentially interact with almost anything in any other handler. You can reduce the risk of interaction by following various disciplines, but most people don’t, and so I’ve spent too many hours debugging IO happening at surprising times, or data changing in surprising places, or things appearing with the wrong type.

An immutable-everything, pure, strongly-typed language like Haskell offers a potent guarantee. Between those three properties, we are assured: no surprises. Feeling the bloody wounds of that debugging time, I would love that no-surprises guarantee to apply during office hours. Which means I want to get that guarantee in web-land, and so I’ve spent time with Yesod, Ur/Web, Opa; enough time, I hope, to grok the philosophies thereof. I’ve spent time reading about lift and noir and django. I’ve read of the many-flowered garden of academic approaches to web programming and the whole menagerie of java frameworks for same.

Those first three go really hard on the no-surprises properties. They provide a lovely little strongly typed and pure and immutable-everything bubble for you to send a response out and twiddle the data store when a request comes in. The guarantee makes me happy, but the experience leaves me a little confused. We’ve eliminated surprising interactions between mutations within different handlers. Interactions between a handler’s mutators are limited by the language’s immutable-everything style. But all the handlers are still interacting with the database in a wild mutation frenzy, and there is a strange and lurking sense that the process is not that different to that of a rails app. A request comes in, so we change stuff in the data store, and now we have all the potential surprises of mutability. Wha?

The problem is that in choosing to make the request handler strongly typed and pure and immutable-everything, we choose to allow those guarantees to stop at the borders of the request handler. Can we model the application in such a way as to extend these boundaries right out to the border of the application?

Here’s a statement that is always true: the state of the application is a function of the inputs it has received. Historically, that state is maintained by continuously making small incremental updates, but that is in some sense an optimisation. We can define the state as a function of the inputs, and then recompute the state every time we need it. Or, we can use techniques from FRP and incremental computing to keep the state current as the inputs list changes.

If it’s a short-lived application, then you could use incremental computing to define the state function in memory on the inputs, and let that machinery keep the state up to date as you add inputs. It should be possible to implement the same machinery[1] so that it works over data that lives somewhere more permanent, too. Going even further may be an option: we can write a compiler that takes a function from inputs to state, and produces a function that tells you how to update the state for any given change to the input.

If we can do that, then we can write a pure function over the inputs, and get an efficient program. Then, the next time the question presents itself, “how the hell did that happen?”, the possibility space to be explored will be that much smaller. Prevention of bugs, rather than cure.

[1] for each value, you keep track of values that depend on it and may have to change when it does. When a value changes, you recompute the dependent values and see if they change. Recurse if necessary.

Fine Shambles

Testing is pooched

Stopping at the border

How To Program