Wednesday, May 21, 2008

Haskell and Performance

I have just read Haskell and Performance via Planet Haskell. I haven't been following the recent discussions on the mailing lists that Neil refers to, but I did spend a good year or so working part-time on a pet project I had in Haskell.

Some Haskell libraries are poorly optimised
At one stage I wanted to use bitwise operations for performance. I tried Data.Bits, but that didn't help a great deal. Then I googled around and found a post about the implementation not being particularly fast.

Haskell's multi-threaded performance is amazing
I did not find this the case at all. I tried forkIO and friends as well as STM. From memory, the STM version was elegant, perhaps even beautiful, but not fast enough. I think the issue had to do with laziness - thunks not being evaluated in the worker threads - which I spent considerable time trying to solve.

Reading the Core is not easy
Some documentation I read suggested reading the core, as well as comments from various people. I tried it, but generally couldn't understand it enough to be of use.

Optimisation without profiling is pointless
I agree with the point that optimising for performance is a waste of time if not necessary. Secondly, using GHC profiling was very helpful.

Final Thoughts
These comments are really only relevant to me and the codebase I was working on and not a generalisation for others. In fairness, when I started my project, I didn't know Haskell or functional programming in general. I had spent the previous years predominantly in Java. However I spent over a year working part-time on my project and in that time it went through many changes and a complete re-write.

For a portion of the project, performance was critical (in the end it was running on about 14 cores over 5 machines). I expended a significant percentage of my time on the project battling with optimising the Haskell code. I tried strictness annotations in the places where strictness would seem to be useful, arrays, unboxing, etc. In the end, I could not build a deterministic mental model of how applying these techniques would affect the program at runtime, especially when using GHC with -02.

To solve the problem, I rewrote the performance critical portion in Java (I would have used Scala, but did not wish to spend the time getting up to speed with it at that point). My fingers got sore typing the amount of Java code necessary to do even the simplest of things in Haskell (eg. partial application). When finished, the Java version ran many times faster than the Haskell one (I wish I could remember by how much) and furthermore, I could actually reason about the impact on performance when making changes to the Java version.

Hopefully I can fix my gap in knowledge about performance optimisation in Haskell, so I don't have to resort to Java again.

No comments: