Friday, February 27, 2009

The Brilliance of Type Signatures in Haskell

I've been soaking up some Haskell lately, and have rediscovered the beauty of type signatures. As I vaguely remember from university, the ML-family of languages are truly profound in this regard.

Background

A type signature for a function in Haskell is somewhat like a method signature in Java (particularly with generics). However, they make stronger statements, due to:

  • Haskell is a pure functional language: there are no side effects (for the sake of this post). The expression (foo 5) is mathematical in nature.
  • Haskell does not support type coercion (see comments).
  • Haskell cannot introspect a given type.
Examples

Like many languages, Haskell provides tuples: e.g. (1,"hello") or (3.14, True). Consider this type signature for the function fst:

fst :: (a, b) -> a
This reads in English "fst accepts a 2-tuple of types a and b and returns something of type a". It is vital to understand that a and b are types: e.g. Int, String, etc.

If you guessed that fst returns the first element in the tuple: bingo. This is intuitive and natural.

However, there's more, as explained in Real World Haskell. Because of the purity of Haskell and its strong type system, it is true that the only possible behaviour for a non-pathological version of fst is that it returns the first element. In this example, there is no other option. That is mind-blowing; this kind of reasoning predicates some of the optimizations possible in Haskell.

Consider this example, noting that [a] is a list of a in Haskell:
concat :: [[a]] -> [a]

What does it do? In Java, I might provide Javadoc (which could be false) but even a method signature doesn't tell the whole story: the method might quietly log to a file, write to a database, or launch missles. Here, we might intuit that concat flattens a list (Ed's note: or the first level -- see comments):

concat [ [1] , [2,3], [4] ] = [1,2,3,4]


Bingo.

As one works more and more with Haskell, the type signatures become vital, and simply fascinating.

A final, complex example. A Haskell monad has a combining function. For simplicity, I'll call it myChain. It has a type signature of (where m is a Monad):
myChain :: m a -> (a -> m b) -> m b

For this post, m is a generic type. In Java, think of WeakReference<T> or List<T>.

In English, we say that myChain accepts a monad m of type a, and a function that accepts type a and returns a monad of type b. The result of myChain is a monad of type b.

If we stare at this long enough, we can guess the following about this function, without knowing anything about monads: it probably pulls out the item of type a from the first parameter and applies the function provided as the second parameter.

This is true. We've been able to intuit something about the mighty monad simply from the type signature.

Brilliant.

The Upshot

There has long been a tension between dynamically-typed languages and statically-typed languages. In the last few years, the dynamic languages have become the media darlings. What's not to love? They are productive and mind-bending in their own way.

However, don't sell the statically-typed languages short: especially, the functional languages with strong, static typing. This is deeply profound stuff. As I work more and more with type signatures in Haskell, I feel the psychic fingerprints of abstract algebra and other math theory: it is as though we prove programs rather than write them.

ps. myChain is known as >>= in Haskell, and is usually called bind.

17 comments:

Anonymous said...

I would only add two things:

To launch missiles, you have to put it in the type signature that you have a side effect. I love this.

In addition to the strong static typing, the inferred typing is what makes it usable. If it weren't for this, all of the algebraic beauty would be broken because the entire expression would be filled with type line noise, just as the functional-style libraries (e.g., lambdaj) for Java require.

Joe Fredette said...

Just a note, in your defn. of concat, you note it has type sig:

concat :: [[a]] -> [a]

then you try to apply it to the list
[1,[2,3],[[4]]].

However, you have several type errors. firstly, [1,[2,3],[[4]]] is not a valid list. since it's not homogeneous. That is, it's elements do not all have the same type. A simple proof, lists have type [a], for some type a, let a = "Int", well, that covers the element "1", but the next element, [2,3], has type [Int], so that doesn't work. Similarly, if you let a = [Int], you have a problem with the first element. Homogeneity important! To fix it, we can "lift" all those types into a list-of-lists. Doing so, we get:

[[[1]],[[2,3]],[[4]]] :: [[[a]]]

This solves the first type error. The other problem was just in application. Concat doesn't not flatten a list, it flattens 1 layer out of a list. Again, examine the types. Concat has type [[a]] -> [a]. In our case, a = [Int], so we go from [[[Int]]] -> [[Int]]. or from a list-of-lists-of-lists to a list-of-lists. We would have to "chain" these concats via (.) (compose).

Hopefully that's clear, good post!

Anonymous said...

# Haskell does not support type coercion.

Wrong

Michael Easter said...

@Joe. thanks! I fixed the post. That's what I get for writing at 6 am and not checking my work

@phil. re: missiles :-)

@anon. can you elaborate?

Moandji Ezana said...

I've just started reading Real World Haskell, it's mind-opening in terms of what static typing can be, compared to Java.

Adam Jones said...

@Anonymous

I'm not sure why you're saying this is wrong. As far as I know, it doesn't support type coercion.

You can manually specify a type that might not be inferred correctly (e.g. (16::Float)), but that's all done at compile time. As far as the resultant program is concerned the 16 above always was a Float, even if it was written using "integer syntax" in the source code.

You also can do type conversions with functions, but that's just like what you find in almost any other language. I suppose this is what you meant?

"Type coercion" typically signifies an implicit conversion between types. To my knowledge, post is correct; Haskell does not do this.

Michael Easter said...

Thanks for the note, Adam... I pulled the notion from RWH, which says that Haskell does not use simple type coercion, and implied that it doesn't really use any.

I wonder if Anon was thinking of type inference? From my limited understanding, type inference is useful in these kinds of languages.

Anonymous said...

I don't get this argument (and I've also read it in Real World Haskell)

As far as I can tell, the only thing you know from the type definition is that you get back something of type 'a'. But couldn't the actual value be any function whose return type is 'a'?

The name 'fst' could just as easily stand for 'FrequencyStablizedTransform' (whatever that would mean)....


I agree that if the type definition was defined as

returnFirstElement :: (a,b) -> a

then you could make the argument, but in that case, the 'type' definition is superfluous as an aid.


What am I missing?

-------
it is true that the only possible behaviour for a non-pathological version of fst is that it returns the first element. In this example, there is no other option.

Anonymous said...

I think anon@2/27-10:27AM may have been referring to unsafeCoerce, which really is something like a C cast, and as the name implies, very dangerous! But, it's not really part of the Haskell language; its purpose is to let you call C functions from Haskell and use the results in Haskell expressions.

anon@2/28-3:54PM: the idea in the "fst" example is that "a" is not a concrete type like Integer (types in Haskell begin with upper case letters). Rather, it is a "type variable". The signature of fst is (a,b)->a, which tells the first arg can be of any type at all, the second arg can be of any type (maybe the same as the first arg's type, maybe not), and the return type is the same as the type of the first arg. If you search for the paper "theorems for free", it tells you how type signatures can sometimes let you deduce what the function does. "fst" is an example.

Ankur Arora said...

@anon. Re: returnFirstElement :: (a,b) -> a

As I understand it (and I'm a beginner myself), we have two things to infer behaviour from: strong static typing and that every function (a pure on anyway) is just a single expression.

Thus, the type of the function also allows us to infer the type of the expression defining the function.

So, to put it simply if
> FrequencyStablizedTransform:: (a,b) -> a

wanted to return any value of type 'a', then well, where'd it get any from? The typing definitely doesn't allow it to call a RNG. It also doesn't know the specific type of 'a' . So it cannot return, for instance: 2 * a.

That said, however, I'll be glad if somebody who knows haskell confirms this :).

Anonymous said...

OK I think I understand the argument for being able to infer 'fst'. But I don't see how it's useful or interesting. Clearly this can't be generalized, otherwise there wouldn't be any need for functions at all

Ankur Arora said...

> OK I think I understand the argument for being able to
> infer 'fst'. But I don't see how it's useful or
> interesting.
Why do you say that? Nothing that I said earlier was specific to fst at all.

> Clearly this can't be generalized, otherwise there
> wouldn't be any need for functions at all
Not true. And this can be generalized. For instance take
head :: [a]->a

What you can infer from the type is that it gives you an element from [a] unmolested. The first element follows from the name but it's perfectly legal for it to give any other element as well.

Anonymous said...

But that's my point. If you can infer the purpose from the name, then the type signature is not relevent. More relevent, when I suggested that one can't generalize, what I meant was that you can't write a type signature like

x :: (a -> b) -> c and tell me the result. You can only tell me that given two types, you'll get something back of a third type but you can't tell me its value. If you could, there'd be no need to write down the function for x




--------------
The first element follows from the name

Michael Easter said...

There are 2 aspects to my post. One is that one get intuit the behaviour about functions from the type signature. This may or may not be interesting. I think it is very useful.

The point in RWH regarding fst is that because of some factors, one can reason that there is only one reasonable way the function can work. This is a strong statement.

In Java, consider this interface:

T getFirst(List<T> t);

This probably just returns the first element but due to side-effects we don't really know what it does.

For the same signature in C++, it might even do different things based on the type of T (e.g. if T == Integer versus T == String).

Haskell is pure and value-oriented, so there are no side effects to worry about. Also, I think that a function cannot introspect the type parameter. The upshot is that makes it easier to reason about the implementation, which has direct benefits for concurrency and other optimizations.

Brian Gilstrap said...

I'm completely lost about the assertion that fst must return the first argument.

What if b is a type which contains an 'a' and does some operation (such as multiplication) of the 'a' passed and the 'a' contained in the 'b' and returns that? What if it returns one of the two based upon a comparison?

Either the assertion is wrong or I'm missing something about Haskell... :-)

Ankur Arora said...

> I'm completely lost about the assertion that fst must return the first argument.

> What if b is a type which contains an 'a' and does some operation (such as multiplication) of the 'a' passed and the 'a' contained in the 'b' and returns that? What if it returns one of the two based upon a comparison?

But, fst doesn't know anything about types a or b. They are type variables, and well could be anything. Thus, no looking at the innards of b. No 2*a. No comparison...

> Either the assertion is wrong or I'm missing something about Haskell... :-)

Michael Easter said...

@Brian

I was thinking of a non-pathological implementation (e.g. one that tries to do the right thing).

However, I think it might even be the case that it must return the first element of the tuple.

Note that if we have fst for ( Integer, String ) then fst could return any Integer. However for fst (a , b) where a and b are types: we don't know what type 'a' is, and we don't have any way to get another one, so it may be true that we must return the first element.

In your case, I believe you are equating 'a' to 'b'. i.e what is fst (a, a) ? In this case, it is true that fst could return the 2nd element.