From Logic Variables to Type Variables

When we were here last, I outlined how to write most of a type inferencer in haskell. I outlined a 5-step plan, and charted a course through the treacherous waters that stopped just shy of the fifth step: “Remove all the unification variables, replacing them with TyVars and introducing Foralls as appropriate to get into roughly the form described above.” And then I stopped! Let’s talk about how to go from an expression tree that is full of unification-fd’s ​​UVars to one that explicitly introduces Foralls in appropriate places.

What does it mean to say something is an appropriate place to introduce a Forall? Remember that the point of these is to declare where the type machinery has to commit to a particular type being used consistently for a type variable. If we consider haskell’s map function (a -> b) -> [a] -> [b], once we commit to a particular type a in either argument we’re locked in to using that type in both arguments. By the same token, using a type for b in the result requires you to use the same type in the first argument, and vice versa. So we need to put both our foralls at the start, yielding forall a b. (a -> b) -> [a] -> [b].

It’s not always necessary to put all the ​foralls at the start. For example, consider a function that takes an argument of type c and returns a function that takes an argument of type d, ignores it, and returns the initial argument c. This is probably const and has type  c -> (d -> c). In this case, as soon as you pass in the first argument, we’re committed to the final return type. We’re not committed to the type of the second argument, though! So we end up with forall c. c -> (forall d. d -> c). The value you get after picking a type for c and applying a value of that type as the first argument has a forall in it, which reflects the fact that you still get to choose what type will substitute in for the type variable  d.

Each of those examples reflects one of my priorities in cooking up a function for introducing Foralls. The first is an example of making Foralls as specific as necessary. The second is an example of making them as general as possible. Putting that a little more concretely, I want to put the Forall for each logic variable as far down in the AST as possible while still only having one Forall per logic variable.

In attempting to write a function to do that, I found it more difficult than I was anticipating, so I want to talk first about a simple thing based again on Diehl’s Write You A Haskell tutorial. In that tutorial – and in my type inference code – the only time you introduce the sort of flexibility that demands Foralls is around let bindings. That means you can get a good result by structuring your Forall introduction around those let bindings. I categorised expressions that were either at the top level of the program, or at the top of an expression bound within a let clause, as Top expressions. To introduce Foralls to a Top expression, you find all of the type variables in that expression’s type and those of its non-Top sub-expressions, and make Foralls for them. When looking at a non-Top expression, you’ve already introduced Foralls for all its type variables and only need to rewrite the logic variables to type variables. You can see all the gory details in the commit with the code.

At this point I had working code, but I was distinctly dissatisfied with it. Can we allow more generality than that? I think we can!

The key thing is that for any expression, you need to introduce type variables for any logic variable that occurs in two or more child expressions. This gets you 90% of the way and covers pretty much all of the obviously useful functions. There are some funny functions, like const :: a -> b -> a, where that doesn’t cut it. I repair those cases by also introducing type variables for any logic variable that occurs in the type of the expression I’m looking at right now. Here’s the diff with all the particulars.

This implementation seems to work, but it’s definitely not perfect. I know it leaves a certain amount of generality on the table because of the way it handles functions like const. By introducing Foralls for every variable in the type in one go, we lose the opportunity to put a forall further down the type like in the const :: forall c. c -> (forall d. d -> c) motivating example.

It might be possible to solve this by only introducing type variables when a logic variable appears in two sub-expressions of a UTerm Ty, but that would create the potential for another problem. Consider an implementation const x = \_ -> x. We know the expression \_ -> x has type forall d. d -> c, where the type variable c has been brought into scope previously. Knowing that means it’s safe for the type of const to be forall c. c -> (forall d. d -> c). In general, we know that an expression \(x :: a) -> (y :: β) will have a type a -> β. If an algorithm introducing foralls to that type leaves foralls within the type of β, its correctness depends on putting foralls in the same place when processing the type of the expression y :: β. I do not have confidence that my algorithm has that property!

I duck this question entirely at the moment, because we err on the side of introducing foralls earlier than necessary, and at that point the type variables they introduce go into the type environment for sub-expressions. That means we have the same type variable assignments in the environment when we’re introducing foralls to the type of this expression as we do when we’re working on the sub-expressions, which ensures that we come up with the same types for both.

I do have a nagging question, though… does this algorithm that I thought was so much more sophisticated actually just do the same thing as the let-centric algorithm?

In any case…. that’s a wrap! We now have a type inferencer that goes all the way from our expression language to our type language. Sweet!

Advertisements

A Type Inference Implementation Adventure

I stumbled across a blog post entitled “Generic Unification” a while ago. It makes this claim:

The unification-fd package by wren gayle romano is the de-facto standard way to do unification in Haskell. You’d use it if you need to implement type inference for your DSL, for example.

This was incredibly relevant to my interests! At the time I had been thinking about how to do some sort of minimal type inference for A Thing, and discovering that it was harder than I thought it was. Running into a library to which I could offload half of the work was pretty sweet.

I wrote code that used the library to do type inference for The Thing, and it kinda sorta worked a bit. It wasn’t a great success, though, and I kept discovering things that I didn’t really understand. Continually running into things I don’t understand made it seem like a good time to step back and do a learning exercise, which I christened BabyTC. It’s a typechecker for a simple language like you might do as an exercise in a PL implementation course. This post is about the problems I ran into while I was doing this, and how I got through them. If you know a bit about haskell and a bit about how type inference works and are interested in some of the details of making it go, then this post might just be for you!

What Language Are We Checking?

The language is a super-simple lambda calculus sort of thing. A program is a single expression. In haskell syntax, the language looks like this:

data Expr = Lam Text Expr
          | App Expr Expr
          | Var Text
          | Let [(Text, Expr)] Expr
          | Number Integer
          | Text Text

Informally, the meaning of this language goes something like this:

  • The expression can be a Lam, which has a name and an expression. When an argument is applied to the Lam, the expression will be evaluated with the argument bound to the name.
  • The expression can be an App, in which case the App‘s second argument will be applied to the first. The first is going to have to evaluate to a Lam for this to work!
  • If the expression is a Var, then it evaluates to whatever the Var‘s name is bound to.
  • When the expression is a Let, then the first argument is a list of names and expressions. The result of the expression is the second argument, evaluated with every expression in first argument bound to the corresponding name. This was a late addition. We’ll come back to this soon.
  • If the expression is a Number or a Text, then it just evaluates to the value in the expression.

It’s common in things like this to have a Case statement of some sort. I’ve deliberately forgone that for now – I’m happy to inject an ad-hoc library of functions that will constrain things to be Numbers or Texts and provide ways to manipulate them and move between them. It’s also usual to provide a more formal explanation of the semantics, but I don’t really trust myself to get that right. Let me know if you’d like a pointer to a similar language with a more formal specification!

What does unification-fd give us?

If unification-fd is the tool we’re going to use to try to crack this nut, we’d better look at what exactly it gives us. We have:

  • A type, UTerm, representing either a variable or a thing in some language we define
  • A typeclass Unifiable to let us say whether two things in the language we defined can be unified
  • Ways to create, bind and look up variables (the functions of the BindingMonad typeclass)
  • Ways to query the relationship between terms (===, =~=)
  • Ways to assert that terms have particular relationships (=:=, <:=)
  • A Big Red Button (applyBinding,applyBindings)

The post that kicked off this whole caper, Roman Cheplyaka’s Generic Unification, and wren gayle romano’s tutorial on unification-fd cover how all those bits fit together, so I won’t talk about that here. How can we use these bits to do type inference?

What does a type look like?

I started with something like this classic:

data Type = LamTy Type Type
          | NumberTy
          | TextTy
          | TyVar Text
          | ForallTy Text

A LamTy has an argument type and a result type. NumberTy and TextTy are our primitive types. We don’t need an AppTy, because when you write an App expression its type is the type of the LamTy in its first argument after applying the second argument to it.

So what’s the deal with this TyVar and ForallTy business? If you’re writing a map function, it might have a type like (a -> b) -> [a] -> [b]. The a and b in that type are type variables, but where do they come from? It matters because the point where the variables are introduced determines how much flexibility the consumer of the map function has in picking types to send it. Foralls are a way of specifying where type variables come from. If we explicitly put in some foralls to that map function, we get forall a b. ((a -> b) -> [a] -> [b]). What that means is that you can pick any a and b you want, give it a function that takes an ato a b, and get a function that takes a list of the same as to a list of the same bs. But once you commit to an a and a b, you can’t pick another b halfway through. On the other hand, I can write down the type forall a. ((a -> forall b. b) -> [a] -> [forall b. b]) but it’s not going to end well. The caller of this function has to pass in something that can return a value of any type, then the function will presumably only use it to return values of the type the caller expects this function to return a list of. We’ve introduced a demand for more flexibility than we need, which makes problems for the consumer of the map function. If we commit to b early, we know we’re talking about the sameb throughout the whole type signature, and we don’t get those problems.

Our goal is to have a type like this associated with every expression in our program. Depending on exactly what your goal is, you might want to then throw away all except the top-level type, and that’s ok! Our general plan of attack is:

  1. Create a type variable for each expression
  2. Gather information from up and down the expression tree to constrain those type variables and feed it to unification-fd
  3. Ask unification-fd to do its thing and find values for all the type variables
  4. Dig through the variables until we have the actual type for each expression (which may still be a variable!)
  5. Remove all the unification variables, replacing them with TyVars and introducing Foralls as appropriate to get into roughly the form described above.

Rearranging Some Preliminaries

You know that Expr type I gave you before?

I lied.

We want to associate a type with not just the program as a whole, but with every expression in the program. That means rejigging the Expr type so that we associate some type information with every node. In another project, I play a recursion-schemes song that I considered reprising here, but I wondered if something simpler might be fun. The type I wound up using is as follows:

data Node t = Node t (Expr t)
data Expr t = Lam Text (Node t)
            | App (Node t) (Node t)
            | Var Text
            | Let [(Text, Node t)] (Node t)
            | Number Integer
            | Text Text

(plus a handful of derivings: Functor, Foldable, Traversable, Show, Eq. The Traversable one has the fun property of making the compiler hang if optimisations are turned on). 

Now, when I initially make an Expr I make an Expr (). We can allocate an empty unification variable to point to the type of each Node like this:

allocateTyVars :: forall t v m. BindingMonad t v m 
               => Node () -> m (Node v)
allocateTyVars = mapM $ \() -> freeVar

That’s step 1 finished!

I’m afraid I wasn’t just lying about the Expr type. I lied about the Type type too. Unification-fd expects us to provide an algebra that gives us a language to talk about what values our unification values can hold. The following gives a TyF that works as an algebra for unification-fd, and works out to be loosely equivalent to the Type I listed above but without the ForallTys.

data TyF f = LamTy f f 
           | NumberTy
           | TextTy

type Ty v = UTerm TyF v

lamTy :: Ty v -> Ty v -> Ty v
lamTy argTy bodyTy = UTerm $ LamTy argTy bodyTy

numTy :: Ty v
numTy = UTerm NumTy

textTy :: Ty v
textTy = UTerm TextTy

Gathering and Solving Constraints

My initial assumption was that I would use unification-fd’s functions to build up a list of constraints and variables, then pass that list of constraints to a solver and get a set of variable assignments back. This is not how it works! The library uses a state monad and builds up a list of variables and constraints inside it, and when you use =:= it saves a new constraint. Unifying two terms does return a new term, but you don’t need to keep that term around (although the documentation suggests that you can get better performance by using it in place of either of the now-unified terms).

We’re going to walk through our Node/Expr tree, look at every expression, constrain things appropriately, and bind the resulting type to the logic variable in the node.

constrain :: (BindingMonad t v m, Fallible t v e, TinyFallible v e, MonadTrans em, Functor (em m), MonadError e (em m), t ~ TyF, Show v)
          => Map Text (NeedsFreshening, Ty v) -> Node v -> em m (Ty v)
constrain tyEnv (Node tyVar expr) = do ty' <- go expr
                                       lift $ bindVar tyVar ty'
                                       pure ty'
  where go (Lam argName lamBody) = do argTy <- UVar  lift freeVar
                                      bodyTy <- constrain (Map.insert argName (SoFreshAlready, argTy) tyEnv) lamBody
                                      pure (UTerm $ LamTy argTy bodyTy)

When we’re looking at a Lam, we don’t know much about the argument type. We can learn about it both from the body of the function and from the way the function is called. Since we don’t know much about it right now, we just generate a fresh variable for it and stick that variable in the environment while we figure out lamBody. That means the process of gathering constraints for lamBody both gives us a type to be the Lam‘s bodyTy and discovers the details we need for that argTy.

        go (App funExp argExp) = do argTy <- constrain tyEnv argExp
                                    funBodyTy <- UVar  lift freeVar
                                    funExpTy <- constrain tyEnv funExp
                                    unify (UTerm $ LamTy argTy funBodyTy) funExpTy
                                    pure funBodyTy

This was by far the most interesting stanza to figure out. The key to understanding what’s going on here is that we know the type of funExp is going to be a LamTy. If we set that LamTy’s argument type to be the type of argExp, then the type of that LamTy‘s body will be the type of the App expression as a whole.

My first impulse was to get the type of funExp, pattern match on it to pull the actual argTy and funBodyTy from its LamTy, and go from there. That doesn’t work because when this code runs funExp doesn’t have a type for us to get. Instead, we concoct a new LamTy with an argTy that matches this argExp‘s type and a new variable to be both that LamTy‘s return type and the return type for this App expression as a whole. We gather constraints for funExp, and unify the type we get for funExp with this made-up LamTy. It seems to work!

        go (Let bindings bodyExp) = do bindingTys  ((name,) . (NeedsFreshening,))  constrain tyEnv exp) bindings
                                       constrain (Map.union (Map.fromList bindingTys) tyEnv) bodyExp

This might be a good time to talk about why we have these Let expressions. It’s mostly a way to support parametric polymorphism.

I want my program to be able to have terms whose type can vary, and for multiple uses of that term to be able to specialise it to different types. For example, let’s say we have the identity function Lam "x" (Var "x") . We know that this function returns a value of the same type that is passed in to it, yielding a type something like forall a. a -> a in a haskellish syntax or ForallTy "a" (LamTy (TyVar "a") (TyVar "a") in our ADT above. If we asked unification-fd to infer the type for the identity function in isolation, we expect it to give us a type LamTy v v, where v is some logic variable. One part of our program might use it as aLamTy NumTy NumTy , and another as a LamTy TextTy TextTy.  In other words, we want our program to be able to define the function once, and use it multiple times, and have it adopt different values for the type variables each time.

If we don’t do anything to make this work, what happens? When one part of the program uses this identity function, we unify the function’s argument and return types with particular concrete types. Then, when the other part of the program tries to use it with a different type, it won’t be able to unify it. It’s too busy being a Text to hang out with the Numbers!

My first attempt at a solution was to insert a freshen in my App constrainer so that I unified my made-up term with a freshened funExpTy. The call to freshen replaces every logic variable in the term with a new, fresh logic variable, with the new fresh variables sharing the same relationships between them as the existing variables. This solved the problem, but the cure was worse than the disease. One of the ways we get information about the type of a LamTy is from what happens to the Lam when we apply an argument to it and use the result. When we freshen funExpTy we break the connection with the original LamTy. Instead of gaining information about the original LamTy, that new information is associated with the freshened copy and never makes it back.

My solution to this – which I am not at all sure is correct, but it works for at least one simple case – was inspired by Stephen Diehl’s Write You A Haskell‘s choice to treat the variable assignment in a Let quite differently to that in a Lam. Instead of defining an environment of types, he defines one of type schemes. When a Lam inserts a type scheme into the environment, it does nothing at all with type variables and so forces all of the users of the type to agree on exactly what it is. When a Let inserts a type scheme into the environment, it does so in such a way that anyone pulling the type scheme from the environment gets to decide anew where the type variables where land.

Instead of a formally-coherent notion of type schemes, my riff on this just has an environment of tuples. When Let inserts a type into the environment, it inserts a tag of NeedsFreshening alongside it. When a Var inserts a type into the environment, the tag SoFreshAlready is used, because this is a very serious and highly professional project.

        go (Var text)
         | Just (NeedsFreshening, varTy) <- Map.lookup text tyEnv = freshen varTy
         | Just (SoFreshAlready, varTy)  <- Map.lookup text tyEnv = pure varTy
         | Nothing                       <- Map.lookup text tyEnv = throwError $ undefinedVar text tyVar

When we look up the type of an expression-level variable, we inspect the tag alongside the type. If it needs freshening – which means it came from a let binding – then we freshen it before returning it. That resolves the issues about polymorphism I was talking about before.

        go (Number _) = pure numTy
        go (Text _) = pure textTy

Once we make numeric literals be numbers and textual literals be text, we’re all done for constraint gathering!

What’s next?

Steps three and four are really easy! You just call applyBinding on the top type to make unification-fd do its thing, and that’s step 3 out of the way. I wrote a recursive variable lookup thing to handle step 4 but it turns out fullprune/semiprune do the same thing for you. Sweet!

I thought step 5 was easy, but my thing to handle it was wrong and there’s some other fun stuff to talk about at the same time. So I’m going to leave it as an exercise for the reader, and maybe I’ll write about it later! (edited to add: I wrote about it later! It’s called From Logic Variables to Type Variables)

So… apart from that, what’s next?

This take on let bindings doesn’t support inferring types of recursive functions. I think there are at least three options: eliminate recursion with a fixed-point combinator, allocate a type variable to represent the type of this let-bound term and stick it in the environment before you start gathering constraints for the term, or allocate type variables for all of the let-bound terms and stick them all in the environment when gathering constraints for all the terms. I’ve tried the second option, and it seems to work. The third option should let you have mutually recursive functions.

Using let-bindings as a point of introducing and eliminating type variables is a pleasingly simple way to get out of the parametricity pickle. It feels kind of arbitrary to me, though. It might be more theoretically elegant if we didn’t unify all of the types involved when we apply an argument to a function, and instead we demanded that the value we’re applying be a subtype of the expected argument type. I believe this would push me to figure out what the deal is with unification-fd’s subsumes function. I’d probably learn a bunch about covariant and contravariant functions in the process.

Speaking of beliefs… I believe my implementation works, but it’s a pretty weak belief. I should probably test it more thoroughly. I’d also be interested in trying to get a deeper understanding of the practical difference between my NeedsFreshening hack and the Type Scheme business. I kind of want to implement both my approach and the one Diehl uses in coq and try to prove their equivalence.

Useful programming languages normally offer some sort of compound type and some way to build things up and break them down. Maybe I should look into that. But for now, I’m pretty happy to have written my first type inferencer!

Convening The Flag

I just got back from three months at the Recurse Center in NYC. One of the things I did there was co-organise / lead a group (with the excellent Dom) that worked through a series of Capture the Flag challenges. We used the first Stripe CTF, which has a definite unix/C flavour and is reasonably easy for people to get set up on their own machines. I’m going to talk here about what we did, what worked well, and what wasn’t so good.

What We Did

We started by putting a kickoff event on the calendar. I figured out in advance how to do the first level of the CTF, and at the kickoff event I plugged my laptop into the projector, gave a little spiel about how I imagined this would all work, opened up the first level, and asked the room for their thoughts on what to do next. In my introductory remarks I described a process of looking for something that’s a little off, and learning about it or poking at it until you either find a way to make it do what you want, or the value of this investigation goes down enough that finding something else to investigate is more valuable. I also tried to assess the interest in doing these Unix/C challenges versus the interest in doing a series based more on web security.

My goals when working through the first level as a group were to try to make these challenges feel approachable, to hear from as many voices in the room as possible, and to surreptitiously prevent the group from going too far down any unhelpful paths. I don’t think I can take too much credit for getting to hear lots of voices, because I suspect RC people are generally mindful about not dominating conversations. I was pleased with how the conversation went, though, because when somebody answered a question I could ask the rest of the room a follow-up question and get the next answer from somebody else. I think there was only one point where I deliberately declined an answer from somebody and sought a response from elsewhere in the room.

After capturing the first flag together, there was a brief intermission while I figured out ports and firewalls and IP addresses to access the second level. We then did the second level together, but it was pretty scattered. A lot of folks had wandered off, and while I’d been tinkering lots of side conversations had started up and I think many of those were more compelling than the second level. It also didn’t help that I hadn’t prepped the second level, so I was genuinely exploring possibilities along with everyone else in the room. That meant I didn’t have as much attention to devote to looking after the conversation.

We then went our separate ways. I tried to make myself available to answer questions and offer hints during the following two weeks, and then we reconvened to discuss levels 3 and 4. I think it’s safe to say that level 4 was the one that took most time for us. It was also the one with the least code. This is not a coincidence! The obvious thing to do in response to the bugs in the other 5 levels involved very little extra code, whereas most of us responded to level 4 by trying to inject some shellcode. That made it an awful lot trickier to get right. I was very impressed by the variety of solutions for level 4. One person had a very elegant ROP-gadget + shellcode attack; another found a way to disable Address Space Layout Randomisation, and managed to arrange the details for a jump to system and pass it exactly the right command to capture that flag. Another participant had a preposterously dunderheaded approach based on running things enough times that the randomization didn’t matter, and putting a binary on the path with a name taken from some gibberish that happened to be being passed tosystem consistently.

Two weeks later, we met again to discuss levels 5 and 6. In theory I was still available to offer hints, but I hadn’t solved either of them before, so I could really only be a sounding board. One issue we had was that some of our participants took a while after our second meeting to figure out the details for their level 4 attack and didn’t have enough time to complete level 6. That meant we had fewer people with solutions to discuss, but the different tenor of the conversation was pretty fun. For instance, I initially attempted a very subtle attack, which completely failed, but I was able to reuse the infrastructure I’d built up to do a better attack very efficiently.

For the latter two meetings, I had a bunch of questions in mind beyond just “how did you do it?” to keep the conversation going. They were questions like:

  • what did you learn?
  • what was the most annoying thing about these levels?
  • what struck you as particularly cool?
  • are there any tools you wish you’d had to solve these?
  • who did you learn from? who did you teach?

I also had some questions about the thing as a whole for the end of the last session:

  • what would you change about this CTF?
  • what was your favourite level?
  • which attack are you proudest of?

What Worked Well

We had about 20 people for the first meeting, then roughly 8 for the second, and around 6 for the third. One thing I enjoyed was people who hadn’t completed the challenges I picked for the group, or who had picked out other tasks entirely for themselves, came along and could tell us a bit about what they’d been up to. They were able to use their lack of knowledge of the Stripe CTF to force those of us who were doing those challenges to explain things properly.

For the most part, people worked in groups. People would often work together for a time, then peel off and do something else. I worked alone for all of these challenges, but I had some conversations with folks about what sort of things they’d been trying that helped me avoid getting too stuck. This was especially helpful with level 4, where it hadn’t occurred to me that I might not need shellcode.

I think I did an okay job of encouraging people to explain their attacks more fully. For instance, some people used ROP gadgets and some people had no idea what that means. It’s easy to forget that not everyone knows about all the things!

We liked this set of challenges, for the most part. When I asked everyone what they’d change about this CTF, the answer was basically just “more!”. Many of the levels had multiple solutions, and they covered a wide variety of security problems that have occurred in the wild.

What Wasn’t So Good

The Recurse Center is full of people in different parts of their programming journey. Many people there have only ever used python or javascript, and many aren’t yet familiar with web programming. People might use linux but only have beginner-level knowledge of how shells work. I would have liked to have a set of challenges that worked for everyone who turned up interested, but with such a broad set of people I doubt that it would be possible.

I asked a question in that first session wondering if we should abandon the Unix/C challenges in favour of some web security challenges. I did not handle this well, though. Only two people raised their hands, but there’s a good chance that 4 or 5 people wanted to raise their hands but didn’t. After all, that is how raising hands works. And because only two people raised their hands, I continued with my initial plan. To recap: I drew out two people to express an unpopular opinion, proceeded to disregard the opinion that they had bravely expressed, and made a few people feel awkward wondering if they should raise their hands or not. I could have asked the question in a way that stacked the decks in favour of the web security challenges (perhaps by asking if people didn’t feel like they knew enough for the Unix/C challenges, rather than asking if they felt they’d do better with web security), or just skipped the question entirely and explained why I thought this set of challenges would be good.

I wish I’d figured out more levels in advance so that I could give good hints for them. This burnt me particularly badly for level 3, where I worked out the hint I wanted to give out after I was done giving hints for that level.

I’d like to learn more strategies and tactics for keeping a group conversation ping-ponging like it did in the first meeting. It went really well, but in a room like that, it’d only take a few people playing by different rules to make the session take a very different path.

I also wish I’d had a better inclusiveness toolkit for people who had got stuck on level 3 or 4 but kept coming along. I wanted to get them into the conversation, so I asked if they’d finished level 3, and they said no, and I didn’t have anything to ask next. Since this was fairly late in the conversation that day, I could have asked if there were any techniques in the discussion that they’d used before, or that they’d like to use, or if these conversations were making it easier or harder to believe that they were writing secure code.

Would I do it again?

Yes!

Huge thanks to the folks at Stripe for putting the CTF challenges together and distributing them after their competition ended, to my co-conspirator Dom, and to everyone who came along to any of the sessions.

And finally, a huge congratulations to my fellow flag-capturers. It was a pleasure to hack a few of the things with you.

Thanks to Annie for commenting on a draft of this post 😀

Towards A Fundamentally and Fundamentalistically Declarative Client-Side Programming Model

How much of the space of user interface programming models have we, as a society, explored? The story starts with pretty raw event loops and runs through explicitly associating event handlers with elements (like in gtk+ or the HTML DOM) and MVC (as in the Smalltalk tradition). Somewhere there we got FRP-based approaches and their more-imperative Rx cousins.  I think a more purely declarative approach might be possible, and I’m interested in what it would look like.

What do I mean by a fundamentally declarative client-side programming model? The view should be a pure function of the state, and the state should be a pure function of stuff that’s happened in the view. This rules out traditional event handlers that explicitly supply an effect when the event occurs, because putting effects in the view changes our system description from “the state is a pure function of stuff that’s happened in the view” to “the state is a pure function of effects that were triggered by the view”.

One option would be to take a leaf out of the FRP kids’ book and define a Html structure to be a thing that ultimately produces some stream of values, and each event handler within the Html returns a value that will go into the Html‘s stream. There are a few issues around composing these things since if you have some widget generating a Html to include in another Html, you need to massage the values going out to fit in the containing type, and rearrange things coming back in to make sure the inner widget sees its values. All these things can be dealt with but it seems to require some pretty meticulous knot-tying. If we give ourselves permission to go with whatever wild stuff we can dream up, where might we end up?

Let’s define a page. Starting simply, maybe a page is just some text.

page = [text "Hello, World"]

But we might want some interactivity… maybe a button?

page = [text "Hello, World", button "Do Not Press"]

See, the button says “do not press” because nothing’s going to happen when you click on it. Maybe we want a button to be an increment button?

page = [button "+", text "It's been clicked 💥 times"]

But it can’t have been clicked 💥 times. 💥 is not a number. Nothing happens 💥 times. If something were to happen 💥 times, it would be a disaster. So what do we put in that space?

page = [button "+", 
        text "It's been clicked ",
        text (numberToString (clickCount "+")),
        text " times"]

Okay, so that’s cute, but how can we define this clickCount thing? First, we need to look at our buttons, which are the things returned by button. We can imagine a function buttonClicks that returns the click events for a button. But there’s a problem with this, because I want to live in a purely functional world and in a purely functional world the return value of a function is determined entirely by its arguments. That means the button returns the same thing for any given argument. In a sense, there is only one button labeled “+”, even if that one button appears many times on the screen. We may want the button to do different things in different places, though, so we can’t just pass the button to buttonClicks.  We could try to differentiate the buttons by defining a function that produces particular buttons:

incrementCounter = button "+"

But every button with the same label is going to be the same button. How can we make sense of where exactly a click event has come from?

counterPanel n = incrementCounter
page = [counterPanel 0, counterPanel 1, counterPanel 2]

I’d like to introduce an idea I’m calling a “suspension”. A suspension is a value that captures the idea of calling a function: it refers to a function, and it has some arguments to pass to that function. It also has a notion of parents, which are suspensions referring to stack frames above this one in the call stack. Once you have suspensions, you can say things like “give me the clicks on buttons returned from incrementCounters returned by the counterPanel with a number argument of 1″. In a gentle homage to lisp’s quoted expressions, my syntax for the suspension describing those buttons right now would be '(incrementCounter ^(counterPanel n:1)).

counterPanel n = [incrementCounter,
                  length (buttonClicks '(incrementCounter ^(counterPanel n:1)))
                 ]
page = [counterPanel 0, counterPanel 1, counterPanel 2]

Saying that is one thing, but having it work at runtime is quite another. Fortunately, it’s an entirely doable thing! My implementation right now is extremely rough and ready, but programs like the one above definitely work.

A suspension is not a function call expression. The syntax for a suspension resembles that for a function call, but is different in a few key ways. One is that the start of the suspension, the reference to a function, is restricted to being a variable name and will not be evaluated as an expression. The suspension refers to frames produced by the lambda referred to by that variable name, which simplifies the semantics by avoiding questions about exactly which expressions are matched. What should happen if you have two identical function expressions produced by different expressions? Restricting the first field to be a name gets us out of this pickle.

Another reason for the suspension syntax is that the language I’m working with has a sort of haskelly flavour, so once a function expression has all its arguments applied to it, it executes. That means that if you had a suspension that specifies all of a function’s arguments, and you tried to represent it as a function call expression, the runtime will evaluate the expression and keep its result. The expression’s function-call-ness is lost! The runtime can no longer use it as a suspension. It would be possible to resolve this with a syntax that more explicitly indicates when a function invocation should occur, but only a very small proportion of expressions would be valid suspensions. It’d be easy to write something that is a suspension, then make some changes to your code, then accidentally not have a suspension anymore. I really like having a syntax for suspensions that makes it difficult to accidentally turn a suspension into not-a-suspension.

Beyond syntactic differences, I use suspensions to offer pretty alien language features. Function pointers and closures and continuations all offer some way to hold a handle to a computation, but the only thing you can do with it is commence the computation. Suspensions hook into the runtime to let you talk about function invocations that have been made or might be made in the future as part of a computation.

Why do I think this is interesting? Things in front-end programming today have a multitude of names. An element might have the name used to attach an event handler to it, the function or template returning the markup used to generate it, the names used to associate CSS styles with it, maybe some names used to manipulate it dynamically. On some level, this work is an experiment in not giving things any name at all. The increment button is the thing returned by incrementCounter and that’s how we talk about it. When we don’t give a thing any names at all, there is no way to talk about it except by what it is, and when we only talk about what it is there’s no way for names to drift or lag or become redundant or contradictory.

This is a new way of looking at functions and their arguments. As well as talking about a function as a computation to be performed and arguments as inputs to that computation, we can also use the function and its arguments as a way of naming the value (or values) returned by that function. Maybe we can keep using this language to refer to values long after they’ve been returned from the function that has become their name.

In the context of the web, this perspective on functions and arguments can be taken further and used to organise the information by which a website decides what to render. Today, data that informs what people see on a web page comes from all over the place, particularly the URL’s path and query string, the user’s session, and any other cookies or local storage. Each of those sorts of data takes it own route to inform what users ultimately see on the page. If they are all just function arguments, then the interplay between those arguments and the function suspension mechanism dictates where suspended data needs to be stored or sent. If a suspension describes stack frames across multiple different users, then accurately rendering pages that depend on that suspension will necessitate sharing the values from those stack frames by storing them on a server. If a suspension describes frames across multiple pages, then things will need to be saved in cookies or localstorage or the server. I see glimmers here of a language describing what needs to be true of how data is stored, independently of how it is stored in any given version of an application.

There are a few caveats right now. The implementations I can offer you either remember every stack frame ever (in case they are ever suspended), or require functions to opt-in to being remembered (in which case only functions that have opted in can appear in suspensions). I’m working towards a type system that uses types to guide the opt-in flags, but that work is in very early stages. Either way, remembering stack frames (whether you remember all of them or some partial set) is likely to turn into a memory leak, and if this work is to go anywhere I will need to grapple with that somehow.

There are also some little logistical issues. My current implementation returns the values that match a suspension, and multiple buttons can potentially be described by a suspension, which means that you need to do a lot of getting lists and mapping something over the list and aggregating things. So when I say “programs like the one above” work, I mean programs that have a bunch of sadly-obfuscatory mapping and concatenating and summing in order to make them work. This is somewhat galling when the suspension you’re looking at can only really refer to one value, but that’s the nature of life with research-grade software.

Particularly eagle-eyed readers might have noted that I said above that my current implementation returns the values that match a suspension, and that this contradicts my earlier statement that every value returned by incrementCounter is identical. Right now, the value semantics are somewhere between quirky and completely broken. When you retrieve a value from the suspension infrastructure, it’s annotated by the whole call stack that produced that value. You can’t see that at the language level, but the runtime uses it internally to route events to the right place. This is the sort of thing that’s likely to get very confusing, since you can have two values that are equal but if you pass them both to htmlElementEvents you’ll get two different lists back. Are there implementations of ideas like this that allow exploring the potential advantages I describe above but without doing quite so much damage to the value semantics?

There are some big questions around exactly which operations you should be allowed to perform with a suspension. Right now, you can suspend any function and get a list of stack frames back. This works in full generality; the runtime will keep on running your program and passing it the new lists of frames from suspended function calls until it settles down. I do not know if this is guaranteed to converge in general, nor do I have much of an idea of what might cause it to diverge. These suspensions are a big stupid hammer that it’s pretty fun to throw around. The main motivation for them, using them to route user interface events, probably doesn’t need such a very large hammer. It’s probably worth trying to build the smallest sufficient hammer.

Is any of this a good idea? I don’t know!

There are three rough directions I could go in from here:

  1. Keep working and playing and experimenting with suspensions and their possible semantics.
  2. Step back a bit. Look at the benefits suspensions might offer for routing events and describing where data needs to be stored and try to figure out what less-exotic language features might work in the same way. As an example of what that might look like, one way of thinking of my current prototype is that there’s sort of an implicit global variable containing the history. That global variable holds a number of events of varying types, and suspensions let you get the events of a particular type out. If that ability to get elements from a heterogeneous list by type is actually the useful thing, that could presumably be built without any of the other exotica.
  3. Step back a lot. One of my motivating interests in all this is building an app where the server-side state is defined as a pure function of things that have happened on the client, and I could try to do that with only the current set of language features in haskell or ocaml.

At this stage I’m planning on continuing with my current direction. I don’t know if any of this is a good idea, but I definitely think there’s some interesting ideas in here, and I don’t know of anyone else pursuing them, and that’s enough for me.

Huge thanks to Julia, Vaibhav and Veit for commenting on a draft of this post!

LambdaConf, Yarvin and Moldbug

I’ve spent the last weekend enjoying the excellent National Folk Festival in Canberra and thinking a lot about the question of whether or not Curtis Yarvin should be invited to LambdaConf, and if so, whether or not I want to attend. This post is about the latter.

I am only interested in going to LambdaConf if I can do so in opposition to the ideas that Curtis Yarvin published under the name of Mencius Moldbug. Even if Yarvin is right when he protests that he is not a racist (and I’m not sure that he is right when he makes that claim), I believe his ideas have fueled and provided cover for a large number of people who do have racist views. He has the power to speak and write in such a way as to push back against those people, but he has not done so. He has the power to say that slavery as an institution in the USA was evil and wrong, but as far as I know he has not done so. He has the power to say that any movement in the direction of a return to slavery in the USA would be evil and wrong, but as far as I know he has not done so.

For a while, I thought I could attend LambdaConf and do so explicitly in opposition to Moldbug: I could wear a badge that had the word “Moldbug” in a red circle with a line through it, and I could have “I Stand Against Moldbug” in the footer of all my slides. Since Curtis Yarvin has said that he, Yarvin, is attending in a professional capacity, and that the Moldbug identity has no connection to his professional capacity, I thought this might be a viable option. Unfortunately, then Yarvin published a post in a Reddit AMA clarifying a Moldbug post, writing as Curtis Yarvin, and using the first person pronoun. This suggests that the line between Yarvin and Moldbug is not as clear as all that, and that to attend in opposition to Moldbug would be to attend in opposition to Yarvin, which would be an ugly and nasty sort of thing to do. So I don’t want to do that.

My next hope was that I could attend LambdaConf in opposition to Moldbuggian ideas. I could wear #BlackLivesMatter merchandise and RISE logos and Black Girls Code tees and put a rainbow sticker on my laptop and find unmoldbuggian examples to thread through my talk. But as I thought more about the way this would work in practice, I remembered something about John de Goes’ original post, in particular the proposed addition to the pledge of conduct:

That in consideration of the professional nature of this event, I will refrain from discussing potentially offensive and divisive topics unrelated to the topic of programming (specifically religion, morality and politics); except in the company of willing participants to such conversations, and even then, only in a manner consistent with the pledge;

My reading is that attending in opposition to Moldbuggian ideas might be viable this year, but if they could change the code of conduct for this year they would have, and it would make this approach unviable. The presence of people who do not stand entirely behind all aspects of the pledge of conduct means talking about the things it promotes can be divisive, and means even people who want to boost the inclusion of minorities (the list in the pledge includes women, people of colour, disabled individuals, and non-cis sexual orientations and gender identities) have to leave that desire at home. The general direction here is that people should attend exclusively in their professional capacities, and leave the personal and political sides of themselves at home.

There’s a sense in which this years LambdaConf is now an experiment in a particular way of dealing with issues of this sort. I could go in support of unmoldbuggian ideals this year, and possibly contribute to the success of the experiment, but attending in the same manner the next year would probably be impossible. I’d like to avoid this outcome.

In fact, I think the only viable way to attend in opposition to Moldbug is to go proudly and strongly in support of the idea of that split between the personal and professional. There are two problems with this: the first is that I have no idea how to attend strongly and proudly in support of that distinction; the second is that I am not proudly and strongly in support of that idea. It’s one thing for Yarvin to leave his political views at home, but somebody whose ancestry didn’t give them fair skin doesn’t get to leave that behind and present as white for the weekend. Somebody who is queer might be able to choose to pass as cis and straight for the weekend, but I can’t endorse asking them to do so. There are so many aspects of the personal that cannot be left behind that I cannot say I am proudly and strongly in support of separating it from one’s professional identity.

Since I do not currently believe I can attend LambdaConf in opposition to Moldbuggian ideas, I do not currently plan to attend LambdaConf this year.

There are many events; their organisers make choices. There are many potential attendees; we choose which events we want to participate in. I choose not to participate in this one.

That I can make that choice is indeed a luxury for which I am grateful. I hope that De Goes succeeds in making LambdaConf into an event where people who are frequently marginalized by our industry are welcome, and can find a foothold to get into or advance within our industry. I hope that MoonConf succeeds in creating a space where people who feel that they cannot attend an event that sponsors Curtis Yarvin can connect, learn, and share ideas.

There’s a number of things I am deliberately not arguing here:

  • that LambdaConf must exclude Yarvin. I think De Goes and the committee are free to try this experiment in requiring their attendees to leave objectionable views at the door.
  • that a government should
    • exclude Yarvin from participating in official events. Governments excluding people for their beliefs has not worked terribly well on either side of the aisle.
    • allow Yarvin to participate in official events. I do believe that Right and Wrong exist, and that it is entirely possible that Yarvin holds, continues to hold, and has worked to propagate views, and to argue for the implementation of views, that are Wrong.
  • that Yarvin holds, continues to hold, and has worked to propagate, and to argue for the implementation of, views that are Wrong. I think much of his writing is disingenuous at best, but I haven’t read enough of it to conclude that he’s wrong in such a manner that he ought to be excluded. I think I can make a decision on this matter without putting myself through that. I do, however, trust David Nolen’s, Sara J Chipps’, and Bodil Stokke’s reactions to his writing enough to think that I’d probably want him to be excluded. Please don’t @mention these folks in relation to this blog post, they have said what they want to say on this matter and are quite capable of jumping in if they wish to discuss it further
  • that people need to be objectively and evilly Wrong for them to be excluded. Dan P and Alissa (Please don’t @mention these folks in relation to this blog post, they have said what they want to say on this matter and are quite capable of jumping in if they wish to discuss it further) make strong arguments that building an inclusive community can require choosing to exclude people who will produce too much discomfort amongst the people who make up that community.
  • that people must remain eternally responsible for things they claim once. Yarvin has not given any sign of having changed his mind on the problematic things in question, so the notion that he’s being attacked for something he said once in the distant past is not an argument that has any force for me.
  • that if Yarvin were to say All The Right Things after reading this blog post I would change my position. I haven’t read enough of his writing to know how he’d need to change his positions to make me think they didn’t need to be opposed.
  • That this list contains all the things I’m not trying to say with this post.

I’d also like to thank Amar Shah for shedding light on the feedback-gathering process and to Julia Moronoki for her thoughtful post about why she and Christopher Allen are continuing to sponsor the conference (again, please don’t @mention them). In general, I’d like to thank everyone for making most of what I’ve seen of this conversation into a respectful argument and not a screaming bunfight.

Matilda, Cambridge Theatre, London

Where 1984 was gripping, exhilarating and exhausting, Matilda was lovely, light and fun. The opening number riffs somewhat on Monty Python’s “we’re all individuals” as all of these kids sing about how special and unique and wonderful they all are. Where the Pythons play the lone ordinary person for laughs, this one is decidedly darker, with the music and melody going sad as Matilda introduces herself, singing about how she’s not special and not wanted. The joke of how everyone is special and unique keeps the scene from getting too dark, but it’s definitely got an edge to it. This is a pattern throughout most of the musical, with the darkness of Matilda’s home situation and the grimness of her school being lightened by various jokes. In particular, the people that inflict most pain (Matilda’s parents, the Headmistress) are comical figures.

The Headmistress is played by a man in drag, which works exceedingly well, although it may not be entirely politically correct these days. Her evil is played to the audience as a joke, but its consequences are played as real and serious, which works really well as a device for leavening comedy with humour.

I was surprised by how late in the piece Matilda’s ability to move things with her mind is introduced. It only appears three times. The emphasis is really on Matilda being special because she’s so smart and already knows how to read when she starts at school and likes going to the library and making up stories. In other words, she’s special in ways that I expect resonate for lots of kids in the audience. That she’s special in ways that make lots of people in the audience all feel special in the same way is unexamined, and I wouldn’t want it otherwise. It’s not that sort of show.

As you’d expect with music by Tim Minchin, the lyrics have plenty of clever rhymes and are generally top-notch. I wished (as I usually do with musicals) that I could hear the words better; in particular, the kids bits were sung in a pretty high register that I had trouble with. I was generally impressed by the kids, though: there were a lot of them and they were actually good, which is not what I expect of child actors.

The writing generally was fun, with three figures deliciously representing various kinds of evil: the bumbling father, the uncaring mother, and the sinister headmistress. The choreography was fun too, and it all ended happily ever after. It was a bit strange, though: there’s a moment when Matilda has the chance to be whisked away to a happier life, but she seems to want to stay with her parents. Then there’s a scene where nothing relevant changes, and then she makes the opposite decision and chooses to be whisked away. I guess the glimmer of Matilda’s parents liking her and Matolda wanting to stay with her parents makes leaving into a genuine choice, which makes the departure lighter in a way that would be lacking if we still thought her parents wholly indifferent. And that makes the happy ending even happier. Hooray!

1984, Playhouse, London

This was a stunning piece of theatre, which I went to largely on the strength of this review.

The show opens on Winston Smith, sitting at a table and opening a book. He muses to himself about the consequences of the diary he is about to start, the consequences of thoughtcrime.

Then, snap, the show twists into another time, another place. A group of people discuss a book and the reality that book describes. Smith is still present, still sitting at the table, but not interacting with the scene. In his presence on the stage he mirrored the way 1984 is present in the background of any discussion of Big Brother-esque behaviour. We don’t know this scene’s time and place, and the scene nurtures that unknowing. They could be in the fictional universe of 1984, talking about Orwell’s novel. They could be in the fictional universe talking about Smith’s account. Or they could be in our universe, in the future, looking back at the present day where the events of 1984 are in the process of taking place. It’s the play on this final possibility, that 1984 is actually coming true right now in the world outside the theatre, that gives these scenes their power. The play is made up of scenes that are taken from the novel interspersed with scenes discussing the book in this ambiguous other time.

There are two other scenes that I want to talk about. One is a scene that repeated 3 or 4 times, with characters in the scene being unpersoned between repetitions. After the first two times we know how the scene goes, so it keeps making a sort of grisly sense even as people are removed from it. The most macabre aspect is that one of the players in the scene is talking about how proud he is of his daughter for informing on someone, and when he vanishes we know exactly who reported him to the Ministry of Love. Later, in the Ministry of Love, he talks again about how proud he is of his daughter, and it is heartbreaking.

The other scene is at the end, when Winston Smith is in the Ministry of Love and they are having what we might enigmatically call a civilized discussion on the matter of what two plus two is equal two. The whole scene is brilliantly done, with the treatment that Smith is receiving being conveyed by smart deployment of a sound design full of low buzzing and high cracking, and a light design involving columns of high-powered light delivering singular potent strobe-like flashes. That technology works to complement the acting, which is just brilliant. The most powerful moment for me was when Smith, as the rat cage is moved closer to his face, breaks the fourth wall – which has been maintained for the entire production – and pleads, exhorting the audience not to sit there and watch, not to let this happen. To have what feels like a personal request from a man in such very great distress makes for extraordinarily powerful theatre.

It also a moment that, without preaching or getting on a high horse, brings to mind a key question that we would do well to have on our minds as we leave the theatre: are we going to sit there and watch, are we going to let this happen, as the world around us develops more and more attributes that look like something Big Brother dreamed up? That this call was delivered primarily as a way of heightening the drama impressed me a lot. That it also serves to highlight that point is a very clever bit of theatrecraft.

You may disagree that the world around us is developing more and more attributes that look like something Big Brother dreamed up. I consider the matter more or less proven by the Snowden disclosures, but if you’re not, remember that this production was held in London, which is one of the most heavily surveilled cities in the world in terms of CCTV. And if that doesn’t strike you as being amiss, the play remains an extraordinary play, weaving together straightforward depictions of the book’s text and scenes that render those depictions ambiguous on the way to a gripping climax when Smith finally cries out a broken man.