Encoding algebraic data type in JavaScript

JavaScript lacks algebraic data type, but we can emulate it because JavaScript has first-class functions.

Here’s a simple Haskell program declaring Tree data type with Bud, Leaf and Tree data constructors. treeSum calculates the sum of leaf nodes by iterating all nodes in the given tree.

data Tree = Bud | Leaf Int | Tree Tree Tree deriving Show

treeSum :: Tree -> Int
treeSum Bud = 0
treeSum (Leaf n) = n
treeSum (Tree t1 t2) = treeSum t1 + treeSum t2

main = do
	let t0 = treeSum Bud
	putStrLn $ show t0
	let t1 = treeSum (Leaf 2)
	putStrLn $ show t1
	let t2 = treeSum (Tree (Leaf 3) (Tree (Leaf 5) Bud))
	putStrLn $ show t2

In JavaScript, we can emulate the data constructors of Tree with higher order functions. The trick used here is to pass b, l and p as arguments to data constructors. Only one of these 3 arguments is used by each data constructor:

  • b is a value for Bud.
  • l is a function taking an integer n as an argument and returns a value for Leaf.
  • p is a function taking left and right trees as arguments and returns a value for Tree.

(ES6 let and arrow are used for conciseness.)

let Bud = b => l => p => b;
let Leaf = n => b => l => p => l(n);
let Tree = t1 => t2 => b => l => p =>
           p(t1(b)(l)(p))(t2(b)(l)(p))

You can think of Bud as a value by itself. For Leaf and Tree, you can create each node by partially applying arguments.

var t0 = Bud;
var t1 = Leaf(2);
var t2 = Tree(Leaf(2), Bud);

With this encoding scheme, defining treeSum is trivial. You just need to pass 3 functions to handle Bud, Leaf and Tree respectively. If w is Bud, it returns 0. If w is a Leaf node, it returns n. If w is a Tree node, it returns the sum of left and right trees.

let treeSum = w => w(0)(n => n)(x => y => x + y);

The client code looks almost the same to Haskell.

var v0 = treeSum(Bud);
console.log(v0);

var v1 = treeSum(Leaf(2));
console.log(v1);

var v2 = treeSum(Tree(Leaf(3))(Tree(Leaf(5))(Bud)));
console.log(v2);
Advertisements

To be empty or to be null, that’s the problem.

Most programming languages including C# and Java use null to represent the absence of value. Null references are convenient when there is no particular value to return as we can simply return null without thinking too much about the consequences.

However, null references are notorious for causing bugs that can be found only at runtime with NullPointerException. Tony Hoare calls it The Billion Dollar Mistake. He admits that null was introduced simply because it was so easy to implement.

One problem I found particularly irritating about null references is that some types have natural encoding of the absence of value. For example, we often represent the absence of a string with an empty string “”. It means we now have two ways to represent the absence of a string: either an empty string or null.

This is the reason why .NET String class provides IsNullOrEmpty. People use both null and String.Empty so we have to check both to be conservative. Checking only one of them often leads a bug.

Having two values that represent the same concept often leads to bugs. Another example is undefined and null in JavaScript. Some prefer null and others prefer undefined. Unless there is a strong convention that can enforced upon the entire JavaScript community, both undefined and null must be checked.

So I think it is crucial to have a single universal way to represent the absence of value. I think Option or Maybe type used in functional programming languages is a good start though hybrid languages such as F# and Scala still have null to interoperate with .NET and JVM respectively.