A Hindley-Milner type inference implementation in OCaml

Hindley–Milner (HM) is a classical type system for the lambda calculus with parametric polymorphism. If you want to know what HM is and why it is cool, please read Daniel Spiewak’s article “What is Hindley-Milner? (and why is it cool?)

There are multiple implementations of HM in many different languages.

I reimplemented the algorithm in OCaml guided by Scala and Python implementations. All these implementations are based on the Modula-2 implementation of the Cardelli 1987 paper. Because the algorithm described in the paper depends on mutable states for optimization, it is not easy to implement it directly in a purely functional programming language like Haskell.

The program implements a small functional programming language with no concrete syntax. The language consists of identifier, lambda, application, let and let rec. The type inference algorithm reconstructs the type of the given example expressions in the context of some predefined types. The program produces the following results when executed:

letrec factorial = fn n => cond zero n 1 times n factorial pred n in factorial 5 : int
fn x => pair x 3 x true : Type mismatch bool != int
pair f 4 f true : Undefined symbol f
let f = fn x => x in pair f 4 f true : (int * bool)
fn f => f f : Recursive unification
let g = fn f => 5 in g g : int
fn g => let f = fn x => g in pair f 3 f true : (a -> (a * a))
fn f => fn g => fn arg => g f arg : ((c -> d) -> ((d -> b) -> (c -> b)))

Installing OCaml+OPAM+utop+Core Library on Ubuntu Saucy

Install OCaml+OPAM

There are PPAs available that are pinned to specific revisions of OCaml and OPAM.

add-apt-repository ppa:avsm/ppa
apt-get update
apt-get install ocaml opam

Or you can install pre-compiled versions using Binary installer.

wget https://raw.github.com/ocaml/opam/master/shell/opam_installer.sh
sh ./opam_installer.sh /usr/local/bin

Install utop

utop is an improved toplevel for OCaml. You can install utop using opam installl.

opam install utop

Install Core library

Core library is Jane Street’s alternative to the standard library. You can install Core library using opam install

opam install core
opam install async
opam install core_extended

Put the following in your .ocamlinit file to use Core and its associated libraries and syntax extensions in the toplevel:

#use “topfind”
#require “dynlink”
#require “bin_prot.syntax”
#require “sexplib.syntax”
#require “variantslib.syntax”
#require “fieldslib.syntax”
#require “comparelib.syntax”
#require “core”
#require “async”
#require “core_extended”
#require “core.top”
open Core.Std

A Critique of Homoiconicity

Homoiconicity is a property of Lisp in which program and data have the same representation, namely S-expressions. This gives Lisp uniquely strong power to write programs that manipulate programs, such as interpreters and compilers. But it also makes it easy for a novice programmer to be confused between program and data.

Philip Wadler described several such confusions in his paper, “A critique of Abelson and Sussman – or – Why calculating is better than schemin“. The original paper compared Scheme and Miranda, but I changed them to more modern languages Clojure and Haskell respectively.

Clojure lists are not self-quoting

In Clojure, lists are not self-quoting while numbers as data are self-quoting. For example, 3 evaluates to 3 itself, but (1 2 3) evaluates as a function call, not as a list. To include list as a datum, one must quote it as in (quote (1 2 3)) or ‘(1 2 3).

user=> 3
user=> (1 2 3)
ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval5 (NO_SOURCE_FILE:4)
user=> (quote 1 2 3)
user=> ‘(1 2 3)
(1 2 3)

When combined with the substitution model to explain the evaluation of (list (list 1 2) nil), this is quite confusing because the intermediate steps are no longer legal Lisp expressions.

(list (list 1 2) nil)
-> (list (1 2) nil)
-> (list (1 2) ())
-> ((1 2) ())

In Haskell, one just writes [[1,2], []]. It is already a value, so there is no evaluation to explain.

Further confusion with quote

Surprisingly, (peek ”abracadabra) evaluates to quote.

user=> (peek ”abracadabra)

because (peek ”abracadabra) expands to (peek (quote (quote abracadabra))) which evaluates as in

(peek (quote (quote abracadabra)))
-> (peek (quote abracadabra))
-> quote

Evaluating too little or too much

As Clojure represents program and data in the same form, one often mistakenly evaluates either too little or too much.

(peek (quote (a b)))

The result of evaluating the expression above is a. However, many programmers give the answer quote (too little) or the value of the variable a (too much).

In Haskell, no such problem arises because there is no quote.

Other confusions with lists

A list containing just one element x is (list x) or (x) in Clojure, while it is [x] in Haskell. As people tend to drop parenthesis, (x) often mistakenly read as x.

People also get confused between cons and list because (cons x y) and (list x y) in Clojure look similar while x:y and [x, y] in Haskell look distinct.

These are just notational differences, but some notations certainly make people more confused than others.


Programming language research does not talk much about syntax because the preference of one syntax over the others depends much on the taste of a programmer. But the famous S-expression certainly hinders reasoning with algebraic properties, such as associativity. Also most people who are accustomed to infix notations taught at math classes find it hard read S-expressions at first.


A Lisp family language such as Clojure traditionally has been regarded as a good introductory language to students or novice programmers thanks to its simple syntax (or lack of) and semantics. But one of its powerful language feature, homoiconicity is a mixed blessing for beginners as it makes it hard to reason about programs. Maybe this is the reason why other functional programming languages such as OCaml, F#, Scala and Haskell end up not including homoiconicity in the language.

Functions are Objects in Scala

Scala is a multi-paradigm language which supports both object-oriented programming and functional programming. So Scala has both functions and objects. But in the implementation level, function values are treated as objects. The function type A => B is just an abbreviation for the class scala.Function1[A, B],

package scala
trait Function1[A, B] {
    def apply(x: A): B

There are traits Function2, …, Function22 for functions which take more arguments.

An anonymous function such as (x: Int) => x * x is expanded to

new Function1[Int, Int] {
    def apply(x: Int) = x * x

A function call such as f(a, b) is expanded to f.apply(a, b).

So the translation of

val f = (x: Int) => x * x

would be

val f = new Function1[Int, Int] {
    def apply(x: Int) = x * x

This trick is necessary because JVM does not allow passing or returning functions as values. So to overcome the limitation of JVM (lack of higher order functions), Scala compiler wraps function values in objects.

The following method f is not itself a function value.

def f(x: Int): Boolean = ...

But when f is used in a place where a Function type is expected, it is converted automatically to the function value

(x: Int) => f(x)

This is an example of eta expansion.

The code examples in this article are taken from Martin Odersky’s Functional Programming Principles in Scala lecture.

Classes and Substitutions in Scala

In functional programming, it is conventional to define the meaning of a function application using a computation model based on substitution. In Scala, the meaning of classes and objects are also defined using substitution model. Let’s assume that we have a class definition,

class C(x1; …; xm){ … def f(y1; …; yn) = b … }

  • The formal parameters of the class are x1; …; xm.
  • The class defines a method f with formal parameters y1; …; yn.

Then, the expression new C(v1; …; vm).f(w1; …; wn) is rewritten to

[w1/y1; …; wn/yn][v1/x1; …; vm/xm][new C(v1; …; vm)/this]b

  • the substitution of the formal parameters y1; …; yn of the function f by the arguments w1; …; wn,
  • the substitution of the formal parameters x1; …; xm of the class C by the class arguments v1; …; vm,
  • the substitution of the self reference this by the value of the object new C(v1; …; vn)

Martin Odersky explained the evaluation of Scala classes and objects using this model in his lecture, Functional Programming Principles in Scala.