Encoding algebraic data type in JavaScript

JavaScript lacks algebraic data type, but we can emulate it because JavaScript has first-class functions.

Here’s a simple Haskell program declaring Tree data type with Bud, Leaf and Tree data constructors. treeSum calculates the sum of leaf nodes by iterating all nodes in the given tree.

data Tree = Bud | Leaf Int | Tree Tree Tree deriving Show

treeSum :: Tree -> Int
treeSum Bud = 0
treeSum (Leaf n) = n
treeSum (Tree t1 t2) = treeSum t1 + treeSum t2

main = do
	let t0 = treeSum Bud
	putStrLn $ show t0
	let t1 = treeSum (Leaf 2)
	putStrLn $ show t1
	let t2 = treeSum (Tree (Leaf 3) (Tree (Leaf 5) Bud))
	putStrLn $ show t2

In JavaScript, we can emulate the data constructors of Tree with higher order functions. The trick used here is to pass b, l and p as arguments to data constructors. Only one of these 3 arguments is used by each data constructor:

  • b is a value for Bud.
  • l is a function taking an integer n as an argument and returns a value for Leaf.
  • p is a function taking left and right trees as arguments and returns a value for Tree.

(ES6 let and arrow are used for conciseness.)

let Bud = b => l => p => b;
let Leaf = n => b => l => p => l(n);
let Tree = t1 => t2 => b => l => p =>
           p(t1(b)(l)(p))(t2(b)(l)(p))

You can think of Bud as a value by itself. For Leaf and Tree, you can create each node by partially applying arguments.

var t0 = Bud;
var t1 = Leaf(2);
var t2 = Tree(Leaf(2), Bud);

With this encoding scheme, defining treeSum is trivial. You just need to pass 3 functions to handle Bud, Leaf and Tree respectively. If w is Bud, it returns 0. If w is a Leaf node, it returns n. If w is a Tree node, it returns the sum of left and right trees.

let treeSum = w => w(0)(n => n)(x => y => x + y);

The client code looks almost the same to Haskell.

var v0 = treeSum(Bud);
console.log(v0);

var v1 = treeSum(Leaf(2));
console.log(v1);

var v2 = treeSum(Tree(Leaf(3))(Tree(Leaf(5))(Bud)));
console.log(v2);

The origin of notation λ in lambda calculus

Here is an extract from the impact of lambda calculus in logic and computer science.

In Russel and Whitehead's [1910-13] Principia Mathematica the notation for function f with f(x) = 2x + 1 is 2x̂ + 1. Church originally intended to use the notation x̂.2x + 1. The typesetter could not position the hat on the of the x and placed it in front of it, resulting in ^x.2x + 1. Then another typesetter changed it into λx.2x + 1.

Summary of Philip Walder’s “Proofs are Programs”

This is my short summary of Philip Wadler’s “Proofs are Programs: 19th Century Logic and 21st Century Computing“.

Proofs and programs are the same thing. But it took a century of efforts to recognise the relationship.

  • Modern logic began with Gottlob Frege’s Begriffschrift in 1879.
  • Gerhard Gentzen introduced natural deduction in 1934.
  • Alonzo Church introduced lambda calculus in 1932. He also introduced a typed version of lambda calculus in 1940.
  • Dag Prawitz showed how to simplify natural deduction proofs directly in 1956.
  • Haskell Curry noted a correspondence between the types of the combinators and the laws of logic as formulated by Hilbert in 1956.
  • W. A. Howard put together the results of Curry and Prawitz, and wrote down the correspondence between natural deduction and lambda calculus in 1969.
  • Howard’s work was published in 1980.

So it took more than 30 years to recognise the correspondence between Gentzen’s natural deduction and Church’s typed lambda calculus. They are the same thing! Proofs and programs. This correspondence is called Curry-Howard Isomorphism.

Logicians and computer scientists have discovered exactly the same type systems!

  • Hindley-Milner type system: Hindley (1969), Milner (1978).
  • Girard-Reynolds system: Girard (1972), Reynolds (1974).
  • .. still being discovered

Total Functional Programming

This is my summary of D.A. Turner’s Total Functional Programming paper.

1. Introduction

The author suggests a simple discipline of total functional programming designed to exclude the possibility of non-termination (thus not Turing complete too).

A mathematical function must be total, but functions of Haskell and SML are partial because these languages allow unrestricted recursion. For example,

loop :: Int->Int
loop n = 1 + loop n

Passing 0 to loop, we get

loop 0 = 1 + loop 0

By subtracting loop 0 from both sides, we get 0 = 1. This non-sense happens because loop 0 is not an integer despite being of type Int. It is ⊥, the undefined integer.

2. Total functional programming

In total functional programming, ⊥ does not exist. If we have an expression e of type Int, we can be sure that evaluation of e will terminate with an integer value. There are no run-time errors. There are three main advantages of total functional programming:

Simpler Proof Theory

Because every function is a mathematical function, we can do equational reasoning without taking ⊥ into consideration.

Simpler Language Design

The presence of ⊥ makes the language complicated because we have a fundamental language design choice – whether to make functional application strict in the argument.

f ⊥ = ⊥

SML says yes to this while Haskell uses lazy evaluation as the norm. Another example is the & operator on Bool.

True & True = True
True & False = False
False & True = False
False & False = False
⊥ & y = ?
x & ⊥ = ?

Most language chose left-strict &, but this choice is arbitrary and breaks the symmetry which & has in logic.

In total function programming, these semantic choices go away. A split between strict and non-strict languages no longer exists and the evaluation order won’t affect the outcome.

Flexibility of Implementation

In total functional programming, reduction is strongly Church-Rosser, meaning every reduction sequence leads to a normal form and normal forms are unique. This gives much greater freedom for implementor to choose an efficient evaluation strategy without affecting the outcome.

However, total functional programming has disadvantages in that programs are no longer Turing complete and there is no way to write an operating system. The author suggests elementary total functional programming with codata as a solution for these disadvantages.

3. Elementary total functional programming

Elementary means

  1. Type structure no more complicated than Hindley/Milner
  2. Programs and proofs will be kept separate

It is a strongly terminating subset of Haskell (or SML) by imposing three essential restrictions to maintain totality

All case analysis must be complete.

Where a function is defined by pattern matching, every constructor of the argument type must be covered.

Also make all built-in operations total. For example,

0 / 0 = 0

Make hd total by supplying an extra argument, which is the value to be returned if the list is empty. Or simply don’t use hd and always do a case analysis.

Type recursion must be covariant.

It means type recursion through the left hand side of → is prohibited. Otherwise, we can obtain a value, foo of type X, which is equivalent to loop 0 above.

data Silly = Very (Silly->X) ||not allowed!

bad :: Silly -> X
bad (Very f) = f (Very f)
foo :: X
foo = bad (Very bad)

Each recursive function call must be on a syntactic subcomponent of its formal parameter.

Unrestricted general recursion brings back ⊥. So we allow only structural recursion, which is guaranteed to terminate. The author claims that many common algorithms can be written in primitive recursion though some of them need style changes or intermediate data structures.

4. Codata

Programming with Codata

To write an operating system in total functional programming, we need infinite lists. But being total means it is not possible to define infinite data structures. Codata is the solution for this dilemma.

codata Colist a = Conil | a <> Colist a

The type Colist contains all the infinite lists as well as finite ones. We can get the infinite ones by omitting Conil alternative.

We do primitive corecursion on codata. Ordinary recursion on codata is not legal because it might not terminate. Conversely corecursion is not legal on data because data must be finite.

ones :: Colist Nat
ones = 1 <> ones

fibs :: Colist Nat
fibs = f 0 1
       where
       f a b = a <> f b (a+b)

All these infinite structures are total. Every expression whose principle operator is a coconstructor is in normal form.

Coinduction

The definition of bisimilarity

x ≈ y ⇒ hd x = hd y ∧ tl x ≈ tl y

In other words, two pieces of codata are bisimilar if:

  • their finite parts are equal, and
  • their infinite parts are bisimilar.

We can take it as the definition of equality on infinite objects, and perform equational reasoning in proofs.

5. Beyond structural recursion

The author also mentions Walther recursion, a generalisation of primitive recursion.

6. Observations and Concluding Remarks

It may be time to reconsider the decision to choose universality (a language in which we can write all terminating programs and silly programs which fail to terminate) over security.

TypeScript: Best Common Type

TypeScript tries to find the best common type when a type inference is made from several expressions.

Let’s experiment a bit with following types. Square is a subtype of Shape and Circle is also a subtype of Shape.

class Shape {
    color: string;
}

class Square extends Shape {
    sideLength: number;
}

class Circle extends Shape {
	radius: number;
}

The type of xs1 is Shape[] because Shape is the common supertype of Shape, Square and Circle. However, surprisingly the type of xs2 is not Shape[] though Shape is the common supertype of Square and Circle. Here the type of xs2 is (Circle | Square)[] because there is no object that is strictly of type Shape in the array.

var xs1 = [new Shape(), new Square(), new Circle()];
var xs2 = [new Square(), new Circle()];

You can override this type by annotating the type explicitly with Shape[].

var xs3: Shape[] = [new Square(), new Circle()];

Now let’s see how the return type of a function is inferred when the function returns two values with different types.

function createShape1(square: boolean) {
    if (square) {
        return new Square();
    } else {
        return new Shape();;
    }
}

The return type of createShape1 function is Shape as expected because Shape is the common supertype of Square and Shape.

function createShape2(square: boolean) { // Type Error
    if (square) {
        return new Square();
    } else {
        return new Circle();
    }
}

However, if we change the return value of else branch to new Circle(), createShape2 function no longer type checks. TypeScript complains about “error TS2354: No best common type exists among return expressions.” It fails to find the best common type though it can be inferred as either Square|Circle or Shape.

We can satisfy the type checker by giving an explicit type. The return type can be either Square | Circle or Shape.

function createShape3(square: boolean): Square | Circle {
    if (square) {
        return new Square();
    } else {
        return new Circle();
    }
}

function createShape4(square: boolean): Shape {
    if (square) {
        return new Square();
    } else {
        return new Circle();
    }
}

From the examples above, we can see that the type inference rule of TypeScript is not orthogonal. When you mix multiple types in an array, TypeScript finds the best common type that is the union of the types of the element expressions. But when a function mixes multiple return types in return values, it finds the best common type that is the first supertype of each of the others.

You can see the difference in the type inference rules taken from the TypeScript 1.4 specification.

Array Type

  • If the array literal is empty, the resulting type is an array type with the element type Undefined.
  • Otherwise, if the array literal is contextually typed by a type that has a property with the numeric name ‘0’, the resulting type is a tuple type constructed from the types of the element expressions.
  • Otherwise, the resulting type is an array type with an element type that is the union of the types of the element expressions.

Function Return Type

  • If there are no return statements with expressions in f’s function body, the inferred return type is Void.
  • Otherwise, if f’s function body directly references f or references any implicitly typed functions that through this same analysis reference f, the inferred return type is Any.
  • Otherwise, if f is a contextually typed function expression (section 4.9.3), the inferred return type is the union type (section 3.4) of the types of the return statement expressions in the function body, ignoring return statements with no expressions.
  • Otherwise, the inferred return type is the first of the types of the return statement expressions in the function body that is a supertype (section 3.10.3) of each of the others, ignoring return statements with no expressions. A compile-time error occurs if no return statement expression has a type that is a supertype of each of the others.

Thoughts on Intersection Types

Facebook’s Flow is a static type checker, designed to find type errors in JavaScript programs. Flow’s type system is similar to that of TypeScript in that it is a gradual type system and relies heavily on type inference to find type errors.

One interesting feature of Flow that is missing in TypeScript is intersection types. For example, in Flow you can represent a function which takes either Bar or Foo with intersection types.

/* @flow */
class Foo {}
class Bar {}
declare var f: ((x: Foo) => void) & ((x: Bar) => void);
f(new Foo());
f(true); // Flow will error here.

The function f has type ((x: Foo) => void) & ((x: Bar) => void). The notation A & B represents the intersection type of A and B, meaning a value of A & B belongs to both A and B. We can capture this intuition with the following sub-typing rules:

[1] A & B <: A
[2] A & B <: B
[3] S <: A, S <: B ~> S <: A & B

In Flow, intersection types are primarily used to support finitary overloading. func takes either number or string and returns nothing. This can be represented as intersection of two function types as in the following:

type A = (t: number) => void & (t: string) => void
var func : A;

However, TypeScript doesn’t need intersection types to support finitary overloading because it directly supports function overloading.

interface A {
  (t: number): void;
  (t: string): void;
}
var func: A

Or in TypeScript 1.4, you can represent the same function more succinctly with union types.

interface A {
  (t: number | string): void;
}
var func: A

This is not a coincidence! Intersection types are the formal dual of union types. When an arrow is distributed over a union, the union changes to an intersection.

(A -> T) & (B -> T)  <:  (A | B) -> T

So it means the following two types are equivalent:

  • (t: number | string): void
  • (t: number) => void & (t: string) => void

So far intersection types seem redundant once a language have union types (or vice versa). However, there are some real world use cases where intersection type are actually required.

Let’s check how ES6’s new function Object.assign is typed in TypeScript (lib.core.es6.d.ts).

interface ObjectConstructor {
    /**
      * Copy the values of all of the enumerable own properties from one or more source objects to a 
      * target object. Returns the target object.
      * @param target The target object to copy to.
      * @param sources One or more source objects to copy properties from.
      */
    assign(target: any, ...sources: any[]): any;
    ...
}

Currently, both the argument types and the return type of Object.assign is any because the type system of TypeScript can’t represent the type correctly. Let’s try to type this function correctly. First, we can make this function polymorphic by assigning A to target and B[] to sources.

Object.assign<A,B>(target: A, ...sources: B[]): ?;

Okay. We are now stuck with the return type. The return value has all the properties of both A and B. It means with structural typing, the return value is a subtype of both ‘A’ and ‘B’ (belongs to both A and B). Yeah! We need intersection types to represent this value. So the correct signature of this function is:

Object.assign<A, B>(target: A, ...sources: B[]): A & B;

The same reasoning also applies to mixins.

mixins<A,B>(base: { new() :A }, b: B}) : { new(): A & B} 

TypeScript developers are still discussing on adding intersection types. Many people think that there are not enough compelling use cases. Also intersecting two primitive types like string & number makes no sense, which makes intersection types less desirable.

I think adding intersection types to TypeScript makes the language more consistent because these two concepts are the dual of each other and can’t really be separable. But I also think that the language design must be conservative. So let’s just wait until we have more compelling reasons to add intersection types.

TypeScript: More on Union Types

In the previous article, I explained the basics of union types in TypeScript.

Though union types resemble sum types of algebraic data types, they are different types. (Confusingly, sum types are sometimes also called “unions” or “disjoint unions” in some languages.). The notion of union types, A|B denotes the ordinary union of the set of values belonging to A and the set of values belonging to B, with no added tag to identify the origin of a given element.

With this in mind, let’s look at some properties of TypeScript union types:

  • Identity: A|A is equivalent to A
  • Commutativity: A|B is equivalent to B|A
  • Associativity: (A|B)|C is equivalent to A|(B|C)
  • Subtype collapsing: A|B is equivalent to A if B is a subtype of A

In sum types, A|A is different from A because it is the sum of type A and A. (either left A or right A). However, in union types, A|A is equivalent to A because there is no tag attached to make left and right ‘A’ distinct. So it is also called “non-disjoint union types” to differentiate it from “disjoint union types” of ADT.

Non-disjoint union types are different from disjoint union types in that it lacks any kind of case construct. If v is of type A|B, the only members that we can safely access on v are members that belong to both A and B. This is why we can access only weight and gears members of transport object.

interface Car {
    weight: number;
    gears: number;
    type: string;
}
interface Bicycle {
    weight: number;
    gears: boolean;
    size: string;
}
var transport: Car|Bicycle = /* ... */;
console.log(transport.type); // Error, transport does not have 'type' property

So if the constituent types of a union type have no members in common, there is nothing we can do on the union type. For example, if x is of number|string|boolean type, we can only access toString and valueOf members that are common across these three types.

Wait a minute! I’ve just mentioned that union types lack any kind of case construct. But what about the type guards of TypeScript? Isn’t it a kind of case construct that can identify the origin of an element? The answer is both yes and no. Yes, you can inspect the runtime type of a union type with type guards. But no because there is nothing special that prevents you from checking non-constituent types. For example, the following code is perfectly okay in TypeScript:

var x: number | boolean;
if (typeof x === 'number') {
    // ...
}

x is either number or boolean, but the type guard here checks if the type of x is number. In the body of if expression, TypeScript infers that the type of x is number though it never can’t be of number type. On the contrary, in programming languages with sum types, we can check only the constituent types with pattern matching.