Kwang Yul Seo

Programming a better world with Haskell

Scala Option.fold vs Option.map/getOrElse

Scala Option offers two different ways to handle an Option value.

Option.map/getOrElse

val name: Option[String] = request getParameter "name"
name map { _.toUpperCase } getOrElse ""

Option.fold

val name: Option[String] = request getParameter "name"
name.fold("") { _.toUpperCase }

On the spark-dev mailing list, there was a discussion on using Option.fold instead of Option.map/getOrElse combination.

Two idioms look almost the same, but people seem to prefer one over the other for readability reasons. Here is the summary of the discussion:

  • Option.getOrElse/map is a more idiomatic Scala code because Option.fold was only introduced in Scala 2.10.
  • Fold on Option is not obvious to most developers.
  • Option.fold is not readable.
    • Reverses the order of Some vs None.
    • Putting the default condition first there makes it not as intuitive.
    • When code gets long, the lack of an obvious boundary with two closures is confusing. (“} {” compared to getOrElse)
  • Fold is a more functional idiom in general.

It seems people are getting used to functional idioms such as map and filter, but still are reluctant to accept more functional idioms such as Option.fold.

I prefer Option.getOrElse/map because I think putting the default value first is not intuitive and much of Scala code is already written with Option.getOrElse/map. However, both options are fine as long as only one style is used through the project. Consistency is more important than taste!

Fay + Node.js

Fay is a proper subset of Haskell that compiles to JavaScript. Thus it is by definition a statically typed lazy pure functional language. If you want a more thorough introduction to Fay, please read Paul Callaghan’s Web Programming in Haskell and Oliver Charles’s 24 Days of Hackage: fay.

The original intention of Fay is to use Haskell on the client side. If you use a Haskell web framework such as Yesod or Snap, using Fay you can use the same language on both client and server sides and some code can actually be shared.

However, because Fay is simply a subset of Haskell that compiles to JavaScript with no dependencies on the client side, you can use it on the server side too in combination with Node.js. I am not saying it is actually a good idea to write server code in Fay, but it is at least fun to investigate the feasibility. Here is a web server example written in Fay.

{-# LANGUAGE EmptyDataDecls #-}
module Hello where

EmptyDataDecls is required because JavaScript types are represented by empty data declarations in Fay.

import FFI

FFI module provides a foreign function interface.

data Http
data HttpServer
data Request
data Response

Http, HttpServer, Request, Response are JavaScript types we use in this example. They are represented by empty data declarations.

requireHttp :: Fay Http
requireHttp = ffi "require('http')"

This is a simple example of a FFI declaration. It returns the result of require(‘http’) as a Http instance. Fay is a monad which is similar to IO monad. Because a FFI function often has side effects, Fay monad is used to represent this.

 
createServer :: Http -> (Request -> Response -> Fay ()) -> Fay HttpServer
createServer = ffi "%1.createServer(%2)"

consoleLog :: String -> Fay ()
consoleLog = ffi "console.log(%1)"

listen :: HttpServer -> Int -> String -> Fay ()
listen = ffi "%1.listen(%2, %3)"
 
writeHead :: Response -> Int -> String -> Fay ()
writeHead = ffi "%1.writeHead(%2, %3)"
 
end :: Response -> String -> Fay ()
end = ffi "%1.end(%2)"

These FFI declarations use %1, %2 that corresponds to the arguments we specify in the type. Most Fay types are automatically serialized and deserialized. Note that we can only use point free style in FFI functions.

 
main :: Fay ()
main = do
  http <- requireHttp
  server <- createServer http (\req res -> do
    writeHead res 200 "{ 'Content-Type': 'text/plain' }"
    end res "Hello World\n"
    )
  listen server 1337 "127.0.0.1"
  consoleLog "Server running at http://127.0.0.1:1337/"

main is the entry point to our web server example. Its return type is Fay () because a Fay program can’t do anything without interacting with the world outside. Because we already wrapped all the Node.js APIs we use, we can program as if we write a normal Haskell program.

Compare our Fay web server program with the original Node.js program. Except for the FFI bindings, the main code is almost the same as before. However, our version is much more type-safe!

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

A Hindley-Milner type inference implementation in OCaml

Hindley–Milner (HM) is a classical type system for the lambda calculus with parametric polymorphism. If you want to know what HM is and why it is cool, please read Daniel Spiewak’s article “What is Hindley-Milner? (and why is it cool?)

There are multiple implementations of HM in many different languages.

I reimplemented the algorithm in OCaml guided by Scala and Python implementations. All these implementations are based on the Modula-2 implementation of the Cardelli 1987 paper. Because the algorithm described in the paper depends on mutable states for optimization, it is not easy to implement it directly in a purely functional programming language like Haskell.

The program implements a small functional programming language with no concrete syntax. The language consists of identifier, lambda, application, let and let rec. The type inference algorithm reconstructs the type of the given example expressions in the context of some predefined types. The program produces the following results when executed:

letrec factorial = fn n => cond zero n 1 times n factorial pred n in factorial 5 : int
fn x => pair x 3 x true : Type mismatch bool != int
pair f 4 f true : Undefined symbol f
let f = fn x => x in pair f 4 f true : (int * bool)
fn f => f f : Recursive unification
let g = fn f => 5 in g g : int
fn g => let f = fn x => g in pair f 3 f true : (a -> (a * a))
fn f => fn g => fn arg => g f arg : ((c -> d) -> ((d -> b) -> (c -> b)))

Installing OCaml+OPAM+utop+Core Library on Ubuntu Saucy

Install OCaml+OPAM

There are PPAs available that are pinned to specific revisions of OCaml and OPAM.

add-apt-repository ppa:avsm/ppa
apt-get update
apt-get install ocaml opam

Or you can install pre-compiled versions using Binary installer.

wget https://raw.github.com/ocaml/opam/master/shell/opam_installer.sh
sh ./opam_installer.sh /usr/local/bin

Install utop

utop is an improved toplevel for OCaml. You can install utop using opam installl.

opam install utop

Install Core library

Core library is Jane Street’s alternative to the standard library. You can install Core library using opam install

opam install core
opam install async
opam install core_extended

Put the following in your .ocamlinit file to use Core and its associated libraries and syntax extensions in the toplevel:

#use “topfind”
#thread
#require “dynlink”
#camlp4o
#require “bin_prot.syntax”
#require “sexplib.syntax”
#require “variantslib.syntax”
#require “fieldslib.syntax”
#require “comparelib.syntax”
#require “core”
#require “async”
#require “core_extended”
#require “core.top”
open Core.Std

A Critique of Homoiconicity

Homoiconicity is a property of Lisp in which program and data have the same representation, namely S-expressions. This gives Lisp uniquely strong power to write programs that manipulate programs, such as interpreters and compilers. But it also makes it easy for a novice programmer to be confused between program and data.

Philip Wadler described several such confusions in his paper, “A critique of Abelson and Sussman – or – Why calculating is better than schemin“. The original paper compared Scheme and Miranda, but I changed them to more modern languages Clojure and Haskell respectively.

Clojure lists are not self-quoting

In Clojure, lists are not self-quoting while numbers as data are self-quoting. For example, 3 evaluates to 3 itself, but (1 2 3) evaluates as a function call, not as a list. To include list as a datum, one must quote it as in (quote (1 2 3)) or ‘(1 2 3).

user=> 3
3
user=> (1 2 3)
ClassCastException java.lang.Long cannot be cast to clojure.lang.IFn user/eval5 (NO_SOURCE_FILE:4)
user=> (quote 1 2 3)
1
user=> ‘(1 2 3)
(1 2 3)

When combined with the substitution model to explain the evaluation of (list (list 1 2) nil), this is quite confusing because the intermediate steps are no longer legal Lisp expressions.

(list (list 1 2) nil)
-> (list (1 2) nil)
-> (list (1 2) ())
-> ((1 2) ())

In Haskell, one just writes [[1,2], []]. It is already a value, so there is no evaluation to explain.

Further confusion with quote

Surprisingly, (peek ”abracadabra) evaluates to quote.

user=> (peek ”abracadabra)
quote

because (peek ”abracadabra) expands to (peek (quote (quote abracadabra))) which evaluates as in

(peek (quote (quote abracadabra)))
-> (peek (quote abracadabra))
-> quote

Evaluating too little or too much

As Clojure represents program and data in the same form, one often mistakenly evaluates either too little or too much.

(peek (quote (a b)))

The result of evaluating the expression above is a. However, many programmers give the answer quote (too little) or the value of the variable a (too much).

In Haskell, no such problem arises because there is no quote.

Other confusions with lists

A list containing just one element x is (list x) or (x) in Clojure, while it is [x] in Haskell. As people tend to drop parenthesis, (x) often mistakenly read as x.

People also get confused between cons and list because (cons x y) and (list x y) in Clojure look similar while x:y and [x, y] in Haskell look distinct.

These are just notational differences, but some notations certainly make people more confused than others.

Syntax

Programming language research does not talk much about syntax because the preference of one syntax over the others depends much on the taste of a programmer. But the famous S-expression certainly hinders reasoning with algebraic properties, such as associativity. Also most people who are accustomed to infix notations taught at math classes find it hard read S-expressions at first.

Conclusion

A Lisp family language such as Clojure traditionally has been regarded as a good introductory language to students or novice programmers thanks to its simple syntax (or lack of) and semantics. But one of its powerful language feature, homoiconicity is a mixed blessing for beginners as it makes it hard to reason about programs. Maybe this is the reason why other functional programming languages such as OCaml, F#, Scala and Haskell end up not including homoiconicity in the language.

Functions are Objects in Scala

Scala is a multi-paradigm language which supports both object-oriented programming and functional programming. So Scala has both functions and objects. But in the implementation level, function values are treated as objects. The function type A => B is just an abbreviation for the class scala.Function1[A, B],

package scala
trait Function1[A, B] {
    def apply(x: A): B
}

There are traits Function2, …, Function22 for functions which take more arguments.

An anonymous function such as (x: Int) => x * x is expanded to

new Function1[Int, Int] {
    def apply(x: Int) = x * x
}

A function call such as f(a, b) is expanded to f.apply(a, b).

So the translation of

val f = (x: Int) => x * x
f(7)

would be

val f = new Function1[Int, Int] {
    def apply(x: Int) = x * x
}
f.apply(7)

This trick is necessary because JVM does not allow passing or returning functions as values. So to overcome the limitation of JVM (lack of higher order functions), Scala compiler wraps function values in objects.

The following method f is not itself a function value.

def f(x: Int): Boolean = ...

But when f is used in a place where a Function type is expected, it is converted automatically to the function value

(x: Int) => f(x)

This is an example of eta expansion.

The code examples in this article are taken from Martin Odersky’s Functional Programming Principles in Scala lecture.

Classes and Substitutions in Scala

In functional programming, it is conventional to define the meaning of a function application using a computation model based on substitution. In Scala, the meaning of classes and objects are also defined using substitution model. Let’s assume that we have a class definition,

class C(x1; …; xm){ … def f(y1; …; yn) = b … }

  • The formal parameters of the class are x1; …; xm.
  • The class defines a method f with formal parameters y1; …; yn.

Then, the expression new C(v1; …; vm).f(w1; …; wn) is rewritten to

[w1/y1; …; wn/yn][v1/x1; …; vm/xm][new C(v1; …; vm)/this]b

  • the substitution of the formal parameters y1; …; yn of the function f by the arguments w1; …; wn,
  • the substitution of the formal parameters x1; …; xm of the class C by the class arguments v1; …; vm,
  • the substitution of the self reference this by the value of the object new C(v1; …; vn)

Martin Odersky explained the evaluation of Scala classes and objects using this model in his lecture, Functional Programming Principles in Scala.

Evaluation Strategy: Haskell vs Scala

Haskell is a non-strict language, and GHC uses a strategy called laziness which combines non-strictness and sharing for efficiency.

Thus, you can easily implement const which never uses the second argument.

const x y = x

With this definition, it is okay to pass undefined as the second argument of const because y is not never evaluated. But in Haskell, you can also make an argument strict using the BangPatterns GHC extension.

const x !y = x

Interestingly, the situation is reversed in Scala whose default evaluation strategy is strict.

def const(x: Int, y:Int) = x

You can make an argument non-strict by putting the => symbol between the variable name and the type.

def const(x: Int, y: => Int) = x

Functional Programming for Parallelism and Concurrency

One reason why functional programming is gaining more attentions from main stream developers is that it is a promising technology which might solve so called “PPP (popular parallel programming) grand challenge”.

Our current situation looks doomed. Moore’s law is being achieved by increasing # of cores not clock cycles and huge volume workloads requires horizontal scaling. We are hopeless because mutable states in imperative programming combined with parallel processing result in non-determinism and it is not easy to cope with non-determinism in programming.

One possible way to solve this problem is to avoid mutable states. There is no non-determinism without mutable states. There is no race condition or dead lock without mutable states. Mutable states are the root of the problem. Martin Odersky explained how functional programming can solve this problem in his OSCON 2011 Java keynote presentation.

The age of type inference

Main stream languages have been continuously adopting good ideas from functional programming. Java and C# has gradually added programming constructs such as garbage collection, parametric polymorphism, lambda expressions and closures.

In spit of these ongoing endeavors, dynamically typed languages such as Python, Ruby, JavaScript and even Clojure are gaining popularity over statically typed languages. These languages are concise and elegant, and you don’t need to spend your precious time writing complex type annotations.

To regain the power stolen by dynamic languages, statically typed main stream languages have begun to introduce local type inference. For example, C++11 added auto keyword, which directs the compiler to use the initialization expression of a declared variable to deduce its type. Java and C# also added similar constructs.

#include <iostream>
 
using namespace std;
 
int main( )
{
    int count = 10;
    int& countRef = count;
    auto myAuto = countRef;
 
    countRef = 11;
    cout << count << " ";
 
    myAuto = 12;
    cout << count << endl;
}

Unfortunately, the power of type inference in these languages are very limited. Because type inference is tightly coupled with type system, we can’t introduce global type inference such as Hindler-Milner to these languages. Because of compatibility, it is practically impossible to retrofit the type system to allow global type inference.

I think it is time to design and implement a new statically typed language which looks almost the same to dynamic typed languages. Developers from Python, Ruby and JavaScript must be able to use it without learning how to put types all over the program. Dynamic vs Static typing controversy is meaningless if you don’t need to annotate any type in both languages. A statically typed language is always better because it detects errors at compile time for free.

Daniel Spiewak expects in his talk that statically typed languages with structural type systems (necessary for type inference) will dominate in the future and I totally agree with him.

Follow

Get every new post delivered to your Inbox.

Join 61 other followers