To be empty or to be null, that’s the problem.

Most programming languages including C# and Java use null to represent the absence of value. Null references are convenient when there is no particular value to return as we can simply return null without thinking too much about the consequences.

However, null references are notorious for causing bugs that can be found only at runtime with NullPointerException. Tony Hoare calls it The Billion Dollar Mistake. He admits that null was introduced simply because it was so easy to implement.

One problem I found particularly irritating about null references is that some types have natural encoding of the absence of value. For example, we often represent the absence of a string with an empty string “”. It means we now have two ways to represent the absence of a string: either an empty string or null.

This is the reason why .NET String class provides IsNullOrEmpty. People use both null and String.Empty so we have to check both to be conservative. Checking only one of them often leads a bug.

Having two values that represent the same concept often leads to bugs. Another example is undefined and null in JavaScript. Some prefer null and others prefer undefined. Unless there is a strong convention that can enforced upon the entire JavaScript community, both undefined and null must be checked.

So I think it is crucial to have a single universal way to represent the absence of value. I think Option or Maybe type used in functional programming languages is a good start though hybrid languages such as F# and Scala still have null to interoperate with .NET and JVM respectively.

Value restriction of F#

In F#, we can declare a polymorphic value

> let a = [];; 

val a : 'a list

However, if we make it mutable, fsharpi complains with value restriction error

> let mutable a = [];;

  let mutable a = [];;
  ------------^

/Users/kseo/stdin(1,13): error FS0030: Value restriction. The value 'a' has been inferred to have generic type
    val mutable a : '_a list    
Either define 'a' as a simple data term, make it a function with explicit arguments or, if you do not intend for it to be generic, add a type annotation.
> let a = [];; 

val a : 'a list

This looks unreasonable at first! But let’s see what happens if we get rid of this restriction.

let mutable a = []
let f x = a <- (x :: a)

f(1);;
f(true);;
f(“Hello World”);;

The type of function f is ‘a -> unit. Hence f(1), f(true) and f(“Hello World”) type check. However, a ends up having three different types [“Hello World”; true; 1]. Oops! So, the type system of F# has a rule called value restriction to prevent this kind of bad behaviour.

In fact, all ML family languages have value restriction. See Notes on SML97’s Value Restriction for further information.

Emulating Haskell type classes in F#

One language feature I like most in Haskell is type class. It was originally conceived as a way of implementing overloaded arithmetic and equality operators. It is a clever trick to support ad-hoc polymorphism, which does not require extensive modifications of the compiler or the type system.

F# does not provide type class, but we can emulate it with other F# language features such as operator overloading and inline function. Type Classes for F# shows this trick. Here’s the Functor example taken from the blog:

type Fmap = Fmap with
    static member ($) (Fmap, x:option<_>) = fun f -> Option.map f x
    static member ($) (Fmap, x:list<_>  ) = fun f -> List.map   f x
let inline fmap f x = Fmap $ x <| f

Here, fmap function must be inline because inline functions can have statically resolved type parameters. Without the inline modifier, type inference forces the function to take a specific type. In this case, the compiler can’t decide between option and list and emits an error.

You can use it as in the following:

> fmap ((+) 2) [1;2;3] ;;
val it : int list = [3; 4; 5]
> fmap ((+) 2) (Some 3) ;;
val it : int option = Some 5

I transliterated the TreeRec example of Simon Thompson’s paper, Higher-order + Polymorphic = Reusable into F# by emulating type class in this way.

type Tree<'a> =
    | Leaf
    | Node of 'a * Tree<'a> * Tree<'a>

type LeafClass = LeafClass with
    static member ($) (LeafClass, t:'a list)  = []
    static member ($) (LeafClass, t:Tree<'a>) = Leaf
let inline leaf () : ^R = (LeafClass $ Unchecked.defaultof< ^R> )

type NodeClass = NodeClass with
    static member ($) (NodeClass, l1:'a list)  = fun a l2 -> List.concat [l1; [a]; l2]
    static member ($) (NodeClass, t1:Tree<'a>) = fun a t2 -> Node(a, t1, t2)
let inline node a x1 x2 = (NodeClass $ x1) a x2

type TreeRecClass = TreeRecClass with
    static member ($) (TreeRecClass, l:'a list) = fun f st ->
        let listToTree = function
            | [] -> failwith "listToTree"
            | a::x ->
                let n = List.length x / 2
                let l1 = Seq.take n x |> Seq.toList
                let l2 = Seq.skip n x |> Seq.toList
                (a, l1, l2)
        let rec treeRec' = function
            | [] -> st
            | l ->
                let a, t1, t2 = listToTree l
                let v1 = treeRec' t1
                let v2 = treeRec' t2
                f v1 v2 a t1 t2
        treeRec' l
    static member ($) (TreeRecClass, t:Tree<'a>) = fun f st ->
        let rec treeRec' f st = function
            | Leaf -> st
            | Node(a, t1, t2) -> f (treeRec' f st t1) (treeRec' f st t2) a t1 t2
        treeRec' f st t
let inline treeRec f st x = (TreeRecClass $ x) f st

let inline tSort x =
     // FIXME: Implement sorting!
    let mVal sort1 sort2 v = List.concat [sort1; sort2; [v]]
    let mergeVal sort1 sort2 v t1 t2 = mVal sort1 sort2 v
    treeRec mergeVal [] x

One problem with this approach is that we no longer can group related operations together into a single class. It can’t express that Tree-like type t* has three operations: leaf, node and treeRec. We end up having three distinct types LeafClass, NodeClass and TreeRecClass.

Array Typing Problem in C# (and Java)

In this article, I am going to take a look at array typing problem in C# (and Java).

Before we delve into the details of array typing, let’s look at how we declare an array in C#. An array of T elements is written in T[]. For example, you can declare an array of string as in the following:

string[] strs = new string[] { "hello", "world" };

We can pass this string array around to methods which take an argument of type string[]. For example, we call call Sort method to sort the given strings lexicographically.

static void Sort(string[] strs)
{
   // ...
}

But what if we also want to sort an array of ints? Then we need to create an overloaded method which takes an argument of type int[].

static void Sort(int[] ints)
{
   // ...
}

This code smells bad because we duplicate the exact same code except for the argument type. Of course, we have generics (aka, parametric polymorphism) to solve this problem! However, back in the days before C# 2 or Java 5, there was no generics available and the language designers (James Gosling and Anders Hejlsberg) must solve the problem without generics.

Their choice is to make the array type covariant on its element type. To put it another way, S[] is a subtype of T[] if S is a subtype of T. So string[] is a subtype of object[] because string is a subtype of object. This seems to solve the problem nicely because now we can pass an array of string to a method which takes an argument of type object[]. We don’t redundantly need to duplicate the same code for each argument type.

static void Sort(object[] objs)
{
   // ...
}

string[] strs = new string[] { "hello", "world" };
Sort(strs);

Mission completed? No, there is no free lunch in the world. What about the following code?

static void BadMethod(object[] objs)
{
   objs[0] = new object();
}

string[] strs = new string[] { "hello", "world" };
BadMethod(strs);

Create a console application and try this by yourself. Calling BadMethod with strs throws a System.ArrayTypeMismatchException because C# can’t assign an object instance to an array of strings. This is an example of Liskov substitution principle violation. string[] is a subtype of object[], but it behaves unexpectedly when used where object[] is required.

Fortunately, this at least does not violate the security of .NET runtime. We can’t compromise the runtime by disguising the actual element type. However, we no longer can trust that C# compiler will prevent this kind of problem. By making the array type covariant on its element type, we lost static type safety.

Because C# 2 and Java 5 introduced generics to the languages, we have generics in our hands. So technically we no longer need to make this kind of compromise. However, we can’t fix it now because the change will break the backward compatibility. It’s too late.

Forbidden words

There are many scary sounding terms in functional programming. These terms include: “currying”, “homomorphism”, “existential quantification”, “beta reduction”, “category theory”, “algebraic data type”, “Kleisli arrows”, “Curry–Howard correspondence”, “functor”, “applicative”, “monoid” and “monad”. Most functional programming tutorials try to explain what these terms mean as precisely as possible using even more scary looking mathematical notations.

But what about object oriented programming? OOP also includes many scary sounding terms such as “inheritance polymorphism”, “covariance”, “visitor pattern”, “SOLID”. If we explain what these terms mean as precisely as possible, we encounter the exact same situation as FP. Novice programmers would think that OOP is something very scary.

I recently read the articles of F# for fun and profit. Scott Wlaschin explains functional programming concepts without using any of these scary sounding terms. He even has the list of forbidden words. He especially tries hard to avoid the use of words beginning with letter “m”.

I think this is a good approach. Many programmers are interested in functional programming these days. C# programmers want to learn F# and Java programmers want to learn Scala. But they definitely do not want to learn lambda calculus or category theory from the beginning just to learn a new programming language.

Step by step implementation of Option type in C#

Option type is widely used in functional programming languages such as Haskell, F# and Scala. It is conventional to return an option value if a function can fail. For example, the following parseInt parses the given string into an integer. Because parseInt can fail when the input string is malformed, it returns int option instead of int. The return value is either Some(i) or None.

let parseInt intStr =
    try
        let i = System.Int32.Parse intStr
        Some(i)
    with _ -> None

The caller of the function must perform pattern matching on the option value and handle both cases explicitly.

let parseIntWithDefault v def = 
    match parseInt v with
    | Some i -> i
    | None -> def

C# does not provide an Option type, but we can emulate it with the help of C# generics and delegates. Let’s start with the abstract base class Option.

    public abstract class Option<T>
    {
        public abstract T Value { get; }
        public abstract bool IsSome { get; }
        public abstract bool IsNone { get; }
    }

Value property returns the underlying value of type T. IsSome and IsNone properties return a boolean flag to indicate whether this value is Some or None.

Option has two subclasses: Some and None. Both classes are sealed because we don’t want these classes to be extended. This is how we cruelly emulate algebraic data type in C#.

    public sealed class None<T> : Option<T>
    {
        public override T Value
        {
            get { throw new System.NotSupportedException("There is no value"); }
        }

        public override bool IsSome { get { return false; } }
        public override bool IsNone { get { return true; } }
    }

    public sealed class Some<T> : Option<T>
    {
        private T value;

        public Some(T value)
        {
            if (value == null)
            {
                throw new System.ArgumentNullException("value", "Some value was null, use None instead");
            }

            this.value = value;
        }

        public override T Value { get { return value; } }
        public override bool IsSome { get { return true; } }
        public override bool IsNone { get { return false; } }
    }

The implementation is trivial. Value property of None throws an exception because there is no value to return. The constructor of Some throws an exception when the input parameter is null because Some is never null.

The caller can use Option type as in the following:

var v = new Some<int>(0);
if (v.IsSome)
    System.Console.WriteLine(v.Value);
else
    System.Console.WriteLine("None");

This code looks okay at first, but we’ve just lost the power of pattern matching. We no longer can force the caller to handle both Some and None cases. The caller can easily ignore to check IsSome property and just retrieve the underlying value with Value property.

To fix this problem, let’s emulate pattern matching in C# with Match method.

    public abstract class Option<T>
    {
        // Other methods ...
        public abstract TResult Match<TResult>(Func<T, TResult> someFunc, Func<TResult> noneFunc);
    }

Match takes two delegates which handle each case respectively.

    public sealed class None<T> : Option<T>
    {
        // Other methods ...
        public override TResult Match<TResult>(Func<T, TResult> someFunc, Func<TResult> noneFunc)
        {
            return noneFunc();
        }
    }

    public sealed class Some<T> : Option<T>
    {
        // Other methods ...
        public override TResult Match<TResult>(Func<T, TResult> someFunc, Func<TResult> noneFunc)
        {
            return someFunc(value);
        }
    }

With Match method, the caller must pass delegates to handle both Some and None cases explicitly.

var v = new Some<int>(0);
var str = v.Match(
    v  => v.ToString(),
    () => "None"
);
Console.WriteLine(str);

Other methods of Option such as Fold and Map can be easily added. See Option.cs for the full source code with Map method added.

This is just a proof-of-concept implementation. Please refer to Functional C# if you are looking for a full featured functional programming library for C#.

Scala Option.fold vs Option.map/getOrElse

Scala Option offers two different ways to handle an Option value.

Option.map/getOrElse

val name: Option[String] = request getParameter "name"
name map { _.toUpperCase } getOrElse ""

Option.fold

val name: Option[String] = request getParameter "name"
name.fold("") { _.toUpperCase }

On the spark-dev mailing list, there was a discussion on using Option.fold instead of Option.map/getOrElse combination.

Two idioms look almost the same, but people seem to prefer one over the other for readability reasons. Here is the summary of the discussion:

  • Option.getOrElse/map is a more idiomatic Scala code because Option.fold was only introduced in Scala 2.10.
  • Fold on Option is not obvious to most developers.
  • Option.fold is not readable.
    • Reverses the order of Some vs None.
    • Putting the default condition first there makes it not as intuitive.
    • When code gets long, the lack of an obvious boundary with two closures is confusing. (“} {” compared to getOrElse)
  • Fold is a more functional idiom in general.

It seems people are getting used to functional idioms such as map and filter, but still are reluctant to accept more functional idioms such as Option.fold.

I prefer Option.getOrElse/map because I think putting the default value first is not intuitive and much of Scala code is already written with Option.getOrElse/map. However, both options are fine as long as only one style is used through the project. Consistency is more important than taste!