Thursday, December 4, 2008

What makes the Camp and Darcs VCSs unique

In short, not necessarily viewing a VCS repository from a purely chronological/historical perspective, but from a change perspective.

From the Camp website via reddit.

Thursday, November 27, 2008

Lock screen via keyboard on Mac OS X

I want to easily lock the screen on Mac OS X, preferably without using the mouse or having to install any other utilities. Assuming the security setting requiring a password to wake from sleep or screensaver is set, here is what I have found:

  • Control-Eject key sequence will bring up the Restart, Sleep, Shutdown dialog window. Press s to put the computer to sleep. From comments on The Design of the Mac OS X Shutdown Feature via Keyboard shortcut of the day.

  • Add /System/Library/Frameworks/Screensaver.framework/Versions/A/Resources/ to the Dock (via Quickly lock your screen). Unfortunately Spotlight doesn't seem to find this app, however the keyboard shortcut to open the Dock is Control-F3 and typing sc selects the ScreenSaverEngine.

Update: Command-Alt-Eject will put the mac to sleep. Thanks to Kristian.

Friday, November 21, 2008

VirtualBox - OS Virtualisation on the Mac

Parallels and VMware Fusion both look like good OS virualisation solutions for the Mac. However, with the the current Aussie dollar, the US$80 licence fee (same price for both of them) is quite expensive, especially if you also need to acquire a license for Windoze as a guest OS.

I stumbled across VirtualBox, which is now owned by Sun. It is touted as:

... the only professional solution that is freely available as Open Source Software under the terms of the GNU General Public License (GPL).

This can be a little misleading, as there is a dual licensing model, with various useful features only available in the closed-source version.

The closed-source version is available under a Personal use and evaluation license. The strange, but good thing, is that it seems ok to use this version if you want to install it on your machine at work. From point 6 in the licensing faq:
Also, if you install it on your work PC at some large company, this is still personal use. However, if you are an administrator and want to deploy it to the 500 desktops in your company, this would no longer qualify as personal use.

So on that basis, I installed it on my iMac at work and it was trivial to get a Ubuntu 8.04 Desktop guest VM up and running. The bundled user manual provided instructions on installing the Guest Additions which improved the interaction between host and guest OS.

Now if I could only find a version of Windows XP licensed in a similar manner. :-)

Wednesday, November 12, 2008

Jungle Disk R.I.P.

In the past I have tried to manage my files by storing then in Git and keeping a central repository on Amazon S3 via Jungle Disk. I was using this system from two different macs and a linux machine and when it worked, it worked well, but the intermittent corruption/synchronisation issues were a killer.

My fallback plan was to store the central repositories on encFS on a USB memory stick instead of Jungle Disk over S3. I tried with the USB stick formatted as FAT32 and then ext3 but that also had intermittent failures.

Over the last few weeks I have ditched the linux machine and stored the repositories in an encrypted sparse disk image on the USB stick formatted as Mac OS Extended (Journaled). It has worked very well so far!

Tuesday, November 11, 2008

Anders Hejlsberg and Guy Steele: Concurrency and Language Design

An insight into the world of some professional language designers:

Anders Hejlsberg and Guy Steele: Concurrency and Language Design

The theme of restricting side-effects and making them explicit so that pure code can be identified was also present in Erik Meijer's keynote at JAOO Brisbane 2008 and various talks given by Simon Peyton Jones.

Friday, November 7, 2008

Programming to the lowest common denominator

I regularly observe programming to the lowest common denominator in software development teams. It is especially prevalent in teams associated with large organisations.

What do I mean by programming to the lowest common denominator?

Taking a hit for the team

There is currently great emphasis on team members being able to easily understand each other's code. I am not going to comment on that idea itself, just a possible consequence of it:

I don't know where you work, but on our team of 15 or so developers there's no room for a person who writes code the others can't read. Some of us are smart enough to do so, but all of us are smart enough to know that we shouldn't.
From a thread on a local user group.

So here we have a case where if you could write some code that is "better", but is not "readable" by other team members, then you can't write it. What happens if there is a significant difference between the least and most competent programmers on the team (and I observe that there often is)?

Business continuity

Another pervasive force is from management. Business is generally concerned with the risk that the original developer(s) of a particular software application won't be available one day and so they wish to be able to find others who can take over in a timely and cost-effective manner. So business standardises on what they perceive as common skills, tools and languages, which by definition of being common enough is a low standard.

If the previous two points hold, then we now have a situation where business chooses a low standard and and then team members are expected to to work in a low subset of the low standard. Sounds like fun!

I am going to assume that some of the members of these teams are highly intelligent and care a great deal about well they do their work. Now they know, at least subconsciously, that they have to work within these constraints otherwise they won't have that job. I observe that some of them deal with that situation by focusing their attention on just the possibilities that are actually open to them (and perhaps just a little beyond the boundaries). This is actually a very small world, compared to all the possible programming skills, languages, tools, etc.

The sad part though, is when people have internalised that (relatively) little world and they start to emotionally/irrationally defend it and even evangalise it (like in the post above). I admit to being in this situation in the past - you can debate the intelligent part. :-)

Friday, October 31, 2008

Infinite loops in Haskell

I have written about my performance problems with Haskell before.

Infinite loops in Haskell gives an example of some of the subtle "fun" that can be had.

Wednesday, October 29, 2008

Haskell monads

Some weeks ago, I spent a couple of days working through some Haskell with a colleague of mine. This is some of the stuff we went through.

  1. Haskell exercises for beginners to get back in the swing of things.

  2. Write implementations for:
    mflatten :: Maybe (Maybe a) -> Maybe a
    mmap :: (a -> b) -> Maybe a -> Maybe b

    lFlatMap :: [] a -> (a -> [] b) -> [] b
    mFlatMap :: Maybe a -> (a -> Maybe b) -> Maybe b

  3. Observe lFlatMap and mFlatMap can be written with the same pattern. How can we reduce the "duplication"?

  4. Derive flatten from flatMap:
    class FlatMap f where
    flatMap :: f a -> (a -> f b) -> f b

    flatten' :: (FlatMap f) => f (f a) -> f a
    flatten' = error "todo"

  5. Given pure:
    class Pure p where
    pure :: a -> p a
    and flatMap, derive map:
    map' :: (FlatMap f, Pure f) => f a -> (a -> b) -> f b
    map' t f = error "todo"

  6. Let's give the combination of flatMap and pure a name, perhaps Monad:
    class Monad' m where
    flatMap :: m a -> (a -> m b) -> m b
    pure :: a -> m a
    A diagram to describe the relationships between our functions so far:

  7. Write the following functions:
    liftM0    :: (Monad' m) => a -> m a
    liftM1 :: (Monad' m) => (a -> b) -> m a -> m b
    liftM2 :: (Monad' m) => (a -> b -> c) -> m a -> m b -> m c
    liftM3 :: (Monad' m) => (a -> b -> c -> d) -> m a -> m b -> m c -> m d
    ap :: (Monad' m) => m (a -> b) -> m a -> m b
    sequence' :: (Monad' m) => [m a] -> m [a]
    mapM' :: (Monad' m) => (a -> m b) -> [a] -> m [b]

    • liftM0 and liftM1 are just pure and map respectively

    • liftMn for n >= 2 can be written as:

      • n x flatMap + pure

      • (n - 1) x flatMap + map

      • liftM(n - 1) + flatMap

      • liftM(n - 1) + ap

    • sequence' and mapM' can each be written in terms of the other

  8. Finally, note the Monad laws.

Step 2 Notes
I found it natural to write:
lFlatMap xs f = concat $ map f xs

mFlatMap Nothing _ = Nothing
mFlatMap (Just x) f = f x
However, given the similar type signatures, I can instead write:
mFlatMap xs f = mflatten $ mmap f xs

Step 3 Notes
After some reflection I realised it is important is to ignore the function names and implementations. I had internalised the concept of map as a minor generalisation of i.e. traversing across elements in a 'container' and applying the same function to each one, which didn't quite fit with Maybe.

The key is simply the function signatures. If I have some functions f and g, such that
f :: (a -> b) -> t a -> t b
g :: t (t b) -> t b
then I can automatically construct a function h
h :: t a -> (a -> t b) -> t b
where h is simply the composition of g with f and nothing more can be said about it.

It does not matter that f is called map for List and it is not helpful to compare implementations of f for two different types t, looking for similarities beyond the type signature.

For me this is a case where attaching a meaning to a name impeded the process of abstraction.

Tuesday, October 28, 2008

Groups in Haskell

sigfpe wrote an interesting post that includes implementations of a Group in Haskell, amongst other things I don't understand, such as vector spaces and Hopf algebras.

Sunday, October 19, 2008

Australian government guarantees bank deposits

Last weekend the government announced that it is guaranteeing bank deposits for the next three years (Couriermail, ABC stories). This is interesting given the enormous profits the banks have made in recent years. Have they not put enough away for a rainy day? After all, they helped create the current mess.

While I don't want to lose any money I have deposited in the banks, I don't see it as appropriate for the government to use my tax dollars to help the banks profit. So before receiving one cent, the banks should deal with the consequences of their actions like any other business does - accept lower/no profit, cut costs (e.g. excessive executive salaries, exorbitant consulting fees), etc.

I am not an economist, anthropologist or sociologist, but perhaps there are bigger issues at play, such as the fractional-reserve banking system and Western society's (media-reinforced) consumer driven mentality.

Saturday, October 18, 2008

Learn You a Haskell for Great Good!

When I was first learning Haskell I stumbled upon A Gentle Introduction to Haskell, Version 98, except that "gentle" seemed completely the wrong adjective. :-) I think Yet Another Haskell Tutorial was the most useful online tutorial I found at the time.

Now there is Learn You a Haskell for Great Good!, found via Just Testing. Looks like a good introduction for those of us who come from mainstream, imperative languages.

Friday, October 17, 2008

Thoughts about types

Types in programming languages define logical properties of a program in that language.

N.B. I am not talking about type annotations, i.e. the programmer may explicitly specifiy types or they may be inferred.

When a program successfully type checks, the execution of the type checker constitutes a proof of those logical properties w.r.t the program (assuming correct, consistent behaviour of the type checker itself).

If a program fails the type checker it implies that (as far as the type checker is concerned) there is ambiguity or logical inconsistency w.r.t to the program (again assuming no bugs in the type checker itself).

Partial functions subvert the type system and therefore undermine the value of the logical proof provided by the type checker. Apparently there is a relationship between this, Turing completeness and the Halting problem, although I don't understand that yet.

One of my goals over the summer is to work through Types and Programming Languages by Benjamin Pierce.

Monday, October 13, 2008

Zermelo and set theory

I am currently studying axiomatic set theory. In the process I came across Zermelo and Set Theory, which gives details about his work and interactions with other significant mathematicians.

While interesting, it unfortunately didn't help me when working out his proof for the Schroder-Bernstein Theorem in an assignment. :-)

Wednesday, October 8, 2008

Digital Camera

Given the shortlist from my requirements, I bought the Fujifilm FinePix F100fd. Tried it out at the Amberley Air Show on the weekend and am happy with the results so far. Some of the photos are here on Flickr.

On a side note, I also shot some video from our digital video camera and tried uploading that to Flickr as well, but after the Flickr processing the videos looked terrible (mind you my original footage was a bit dodgy to start with :-)). I guess I need to find a decent private/public video sharing site. Vimeo and Viddler look interesting.

Wednesday, October 1, 2008

Digital Camera Requirements

My wife and I have used an old digital camera (generously passed on from a friend when they upgraded) for some years now and the time has finally come to upgrade. So this is what I think we need in a camera:

  1. Neither of us know much about photography so we need to be able to point-and-click and still get reasonably good quality images in different light conditions.

  2. 5x optical zoom or greater.

  3. Low response time between pressing the button and taking the photo.

  4. Wider angle would be handy for those landscape or family photos.

  5. Prefer AA batteries over lithium.

  6. Viewfinder might be nice for outdoor shots when it is difficult to view the LCD.

I would like to spend no more than AUD $400. From reading reviews, I arrived at the following shortlist:
  • Fujifilm FinePix F100fd
    Image quality scored well in reviews: DigitalCameraReview, Digital Camera Resource Page, cnet. Has a 5x optical zoom and and 28mm wide angle lens. No viewfinder and uses a lithium-ion battery. DigitalCameraReview thought this was a great camera for the point-and-click user.

  • Panasonic Lumix DMC-FS20
    Image quality scored well in reviews: DigitalCameraReview, cnet, Steve's Digicams. Auto mode got a good rap. 4x optical zoom and 30mm wide angle lens. No viewfinder, although screen did well outside and has a lithium-ion battery. Cnet thought this was a great camera for the point-and-click user, but it would be a bit over budget.

  • Sony Cyber-shot DSC-W150
    Image quality didn't seem to score quite as well as the previous cameras, although still good: DigitalCameraReview, Steve's Digicams. 5x optical zoom and 30mm wide angle lens. Small viewfinder and lithium-ion battery.

Wednesday, September 24, 2008

Worlds: Controlling the Scope of Side Effects

The state of an imperative program—e.g., the values stored in global and local variables, objects’ instance variables, and arrays—changes as its statements are executed. These changes, or side effects, are visible globally: when one part of the program modifies an object, every other part that holds a reference to the same object (either directly or indirectly) is also affected. This paper introduces worlds, a language construct that reifies the notion of program state, and enables programmers to control the scope of side effects.

This paper is from the Inventing Fundamental New Computing Technologies project at the Viewpoints Research Institute (founded by Alan Kay).

Seems like it is intended to be quite a coarse-grained approach to program state. A little like a developer checking out a copy of a project from a version control system, making changes and either committing or reverting back.

Tuesday, September 16, 2008

State in Clojure

Many people come to Clojure from an imperative language and find themselves out of their element when faced with Clojure's approach to doing things, while others are coming from a more functional background and assume that once they leave Clojure's functional subset, they will be faced with the same story re: state as is found in Java. This essay intends to illuminate Clojure's approach to the problems faced by imperative and functional programs in modeling the world.
From: Values and Change - Clojure's approach to Identity and State.

Friday, September 5, 2008

Farewell Marion

A brave 10 year old boy shed a tear and said goodbye to his mother at her burial site today.

Cancer claims another victim and now her husband and two boys start a new journey together.

Tuesday, August 26, 2008

Apple Store Invoice Missing Labels

The AirPort Express arrived the day after ordering it though. :-)

Wednesday, August 20, 2008

File Management

I need to access my personal files in different locations such as home, work office or when traveling. I wish to be able to move between a number of trusted computers, rather than being tied to a specific laptop, as I like to cycle to and from work. I also don't want the hassle of running my own server at home (have done this in the past).

So my first step in solving this is to store all my files in a Distributed Version Control System. I get all the benefits of a common centralised VCS such as version history and version management between multiple machines, as well as being able to work normally when I don't have an internet connection (e.g. travelling). I have chosen Git, although Mercurial would probably be a fine choice to.

The second step is having a location for a master repository that is accessible over the internet. I could have purchased a hosted linux virtual machine, but I didn't want to deal with setting it up, security, software upgrades, etc. Git can synchronise repositories located at different points on the same file system, so I thought I would try a locally mounted, encrypted virtual file system over Amazon S3. I chose JungleDisk for this purpose.

As it is only me using these Git repositories, I only have one machine writing to the master at any one time, so I don't have to worry about concurrency issues. Secondly, whenever I clone a repository from the master, I use the --no-hardlinks option, although I am not sure if that is necessary.

In principle the ideas have worked out pretty well. I have run into some issues though. From minor to major:

  • S3 has been unavailable on two occasions, when I have tried to access it in the last three months.

  • Sometimes I have had errors pulling (synchronising) from the master. Recreating the local repository by cloning it again from the master has solved these issues. This may also be similar to the next one.

  • I have had a case where I don't get any errors pulling from the master, but I don't get the latest commits pushed from another machine either. This one has been a real pain. In the process of getting everything back to a stable state, I updated to Git 1.6.0, JungleDisk 2.10a, deleted my local JungleDisk caches and reduced the cache size down to the minimum (I would have liked to turn caching off altogether). I suspect the JungleDisk caching was the issue, but that is only a guess. Will see how things go over the next few weeks.

I now don't need backups from a file deletion point of view, as the VCS takes care of that (I am not using any of the Git feature to modify history). I also keep a subset of the machines synchronised on a daily basis, so I don't need backups from a hardware failure/lost/stolen perspective either.

Tuesday, August 19, 2008

Introduction To Scala Training Course

Workingmouse is running an Introduction To Scala training course from Tuesday 2nd September to Thursday 4th September. The course will be run in an interactive, "hands-on" format, with a small number of people, ideally between 4 and 8.

The course covers topics including:

  • Scala syntax

  • A brief introduction to the essence of Functional Programming

  • Algebraic Data Types and Pattern Matching

  • Closures and Higher-Order Functions

  • Integration with Java

  • Intermediate Functional Programming topics (if time permits)

Sunday, August 17, 2008

Scoodi and Me on TV

Well last week I made my TV debut, not that being on TV is something I aspire to. The subject was Scoodi and the show was Brisbane Extra.

I thought I would clarify some of the things that were said though.

First of all, I am not the brains behind Scoodi. That honour would go to Pete, who engaged us to develop the site. During that process I decided that I liked the purpose of Scoodi and thought it had potential, so became more involved.

The filming/interviewing felt pretty awkward and contrived to me. The TV crew were good guys who were trying to make things relaxed, but I wouldn't normally say things like: "You've got council pickup, garage sales and now you've got Scoodi" and "it beats a trip to the tip". Those lines seemed a pretty corny at the time.

Overall though, I was pleasantly surprised at how well the segment turned out. I was even more surprised at the amount of people that visited the site and signed up. I really hope they find it useful.

Wednesday, August 13, 2008

Maths Symbols in NeoOffice

I wanted to have a character (representing the integers) in a NeoOffice document on Mac OS X 10.4. I couldn't find an installed font containing the character and according to Wikipedia I need a Blackboard Bold typeface style. After a few google searches I found a jsMath font that seemed to work fine.

Tuesday, August 12, 2008

GMail Down

Both personal and work GMail is down.

Sunday, August 10, 2008

Paper Airplanes

Alex's paper airplanes and Joseph Palmer's Paper Airplanes have good paper airplane designs. Fun for the whole family!

Thursday, August 7, 2008

Optus Mobile Outage

Optus mobile users were not happy here in Brisbane when they couldn't use their phones yesterday due to a software issue. Story coverage here and here.

Tuesday, August 5, 2008

Presentation: Paul Cormier from Red Hat at QUT

Yesterday, Paul Cormier, Executive Vice President and President, Products and Technologies, Red Hat gave a talk titled "Red Hat and the Open Source Software Business: from boxed software to a half billion dollar server company" at QUT.

Paul made two points that stand out in my mind:

  1. The primary differentiator of an open source project software project (compared to closed source) is in the ability to build a community of people around the project.

  2. He sees Red Hat as taking open source work and packaging it in a way that is palatable from an enterprise perspective. Specifically, providing stability and backward compatibility for a 'long-enough' period of time.
If I consider Scala in this light, it is currently doing well on the first point, but not so well on the second. Given that the primary users of Java are enterprises, this is important if mainstream penetration is an objective.

Having said that, I think there is a time when the cost of backwards compatibility outweighs the benefits and my threshold seems to be a fair bit lower than that of the average enterprise. In this case, I think enterprises would do well to step back and consider the real costs in doing a software project in Java (or Ruby/Groovy for that matter) and seriously consider the potential benefits from more advanced programming languages.

Monday, August 4, 2008

Friday, August 1, 2008

Introduction To Logic

Carnegie-Mellon Open Learning Initiative: Logic and Proofs course

Friday, July 25, 2008

FileVault and EncFS on Mac OS X (Leopard)

In the past I have used FileVault in attempt to protect files in the event of my mac falling into dubious hands. I find it frustrating on a few counts:

  • the root user can access your files while you are logged in

  • it regularly asks to reclaim space when logging out, making logging out much slower

On my linux machine I am using EncFS, which is working out well. So I thought I would try and install EncFS via MacPorts. I ran into all the following issues:

Then I ended up with an error I couldn't find a solution for:

$ sudo port install encfs
---> Building encfs with target all
Error: Target returned: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_fuse_encfs/work/encfs-1.4.2" && make all " returned error 2
FileUtils.cpp:163: error: 'make_binary_object' was not declared in this scope
make[2]: *** [FileUtils.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Error: Status 1 encountered during processing.

So I uninstalled MacFuse in MacPorts and used the installer from the MacFuse homepage. After installing EncFS from here and a reboot, I had success.

Thursday, July 24, 2008

UQ Student Infomation System Error

Tried to access my class timetable from the my.UQ website.

Wednesday, July 23, 2008


SmugMug doesn't have a free account, but it does have a 14 day trial.

As soon as you try and upload some photos, you are forced to categorise them in a gallery. I find this frustrating and instead prefer how flickr works - upload photos and organise using tags and/or sets (similar to galleries) afterwards. So as a workaround I created a 'Default' gallery, which contains all photos I don't explicitly put in other galleries.

As mentioned in my flickr post, I found viewing photos in a gallery more efficient than in a flickr set due to the thumbnails being present. There is also the flexibility to choose from a number of different layouts for each gallery. On the downside, a long description for a gallery takes up a lot of screen real estate at the top, pushing all the photos down. The navigation for some photo actions seemed a little inefficient as it requires clicking on the Photo Tools drop down box and then selecting the action to take.

The first step in improving privacy for family photo sharing is to create a SmugIsland. I set 'Hello world!' to 'No' and 'Hello smuggers!' to 'No - make me an island'. You can set a site-wide viewing password. However, this disables showing a map on your homepage. Google Maps is used and apparently Google requires that all usage of Maps be done without requiring a password. I was also confused by the privacy settings in a gallery. Even though I had setup a SmugIsland, a gallery privacy settings seemed to show that it was public. Supposedly this is correct and the gallery will inherit my SmugIsland settings.

To use maps requires getting a Google Maps key, which is very simple. From there on though, I had very little success with maps. It looks like you can only set the location of photos individually. Once configured I could see a map of my photos that I had set a location for, but a non-logged in user (e.g. family) just sees an empty box where the map should be. When I was logged in and clicked on a pin on the map, it zoomed in, but then showed no photos. This feature just does not seem to work with SmugIslands.

At this point I stopped further investigation as the negatives outweigh the positives. Looks like I will be using flickr.

Tuesday, July 22, 2008


I already had a dummy yahoo id for dealing with freecycle, so signing up to trial flickr was simple. It didn't take long to set the default privacy level to viewable by family only, upload a few photos and add some tags and descriptions. The descriptions support hyperlinks, although you have to enter the HTML manually.

To set the geographical location on a photo you drag a pointer to the photo to a location on a map. It just so happened that one of my photos was taken at a place where the flickr map didn't have enough data, so you couldn't zoom in far enough to be accurate. Otherwise though, the mapping features seemed to work well.

I like the fact that uploading photos requires no categorisation and you can easily view all your photos (flickr calls it your photostream) from several different perspectives.

To share private photos with non flickr users, you use a guest pass. The help mentions that you can share your whole photostream using a guest pass, but the "Share This" link was not appearing on my photostream page. It seems there is a bug if all your photos are private. Making one photo public solved the problem. After generating the guest pass, the photo can be set back to the previous privacy level.

I find the site looks very clean and uncluttered, which I like. The navigation at the top and the expanded nav at the bottom was helpful, so getting around was easy. For flickr to be useful to share your photos you really need a Pro Account. It costs US$25/year and you get unlimited storage, which seems very reasonable.

Now for the things I don't like. When you click on a photo to view it, the resulting page has a widget that only shows thumbnails for the previous and next photos. You can go forwards or backwards in the thumbnails or click on browse to go back, but when viewing photos I found this quite cumbersome. SmugMug does this much better, by showing more thumbnails next to the main image. Viewing photos in flickr as a slideshow looks like a good solution though.

Users with a guest pass for the whole photostream (including private family photos) can see your sets and each photo's description, tags and camera data. However, they:

  • cannot see the geographical location for, or find your private photos on the map

  • cannot browse your private photos via your tags. The tags themselves are visible, but if all your photos are private, clicking on a tag takes you to a "no results found" page, which seems contradictory.

  • cannot comment on your photos

In summary, flickr scores reasonably well against my requirements. Guest pass users have a very basic level of access. If they want more access to your photos, they need to sign up for their own account, which will cost them their time, but nothing financially. Then you can add them as a family contact.

Monday, July 21, 2008

Family Photo Sharing

Most of my extended family do not live near us and I would like a way to keep them more updated with what we are doing. I envisage one aspect of the solution will be sharing our photos with them more frequently than when we visit one another. With that in mind I started to consider what I need from a photo sharing site.

Given that the primary motiviator is sharing, an over-arching goal is convenience for family members. More specifically:

  • They shouldn't have to sign up to the site if they don't want to. I would prefer to give them a url and possibly a password as a starting point. This way they only need to deal with a small part of the complexity of the overall site to achieve the main benefit. As an aside, I consider facebook an example of what not to do in this regard.

  • Can view all photo details, such as date, tags/keywords, description, geographical location, camera details

  • Can browse by date, album, tags/keywords and a map.

    I am using the term album to refer to an arbitary grouping of photos that I specify, e.g. for a holiday, birthday party, etc. Ideally the album would also have a description of its own.

  • Original photo is downloadable

  • Can make comments on photos

  • Flexibility to receive notifications of new photos by a mechanism they understand (e.g. RSS, email), if they wish.

From my perspective:
  • Convenience. The easier it is to upload photos and describe them, the more likely I will actually do it.

  • Security. I wish to restrict my photos to people I choose and then everthing is read-only for them, except for making comments.

  • Photo metadata. Tags/keywords, geotagging and a textual description that supports basic HTML such as hyperlinks.

  • Retain ownership of all my content

  • Some sort of group/community feature might be useful. For example if we are on holiday with other family members (that choose to use the same photo site) then being able to group our photos together for that time might be handy.

  • Export/backup. A tool or API that enables photos and metatdata to be exported for backup.

In building my shortlist of sites to try I stumbled upon 45 Photo Sharing Sites.

I discarded photobucket due to its limits and sharing in picasa web albums looked confusing. I am trying out flickr and smugmug.

Wednesday, July 9, 2008

Converting a CD to FLAC Files (Mac OS X)

I have the following software installed:

The following process would be significantly simpler if one app provided both a decent rip log and metadata lookup.

There is no point being concerned with the quality of the audio unless you know if there were errors introduced in ripping the data off the CD. xAct shines here. Max does not provide enough details, although there is a ticket open with the developer to fix that.

Max does well with metadata. xAct is poor.

The process
$TMP and $GIT_REPO represent file paths, e.g. $TMP=$HOME/tmp
  1. Rip CD using xAct, saving files to $TMP/xAct. [1]

  2. Save xAct output to $TMP/xAct/

  3. Check the log for errors. After the first section of the log file, search for each occurence of 'track' to quickly find the right spots to check.

  4. Open Max. It will automatically detect the CD and pop up a window. Check the metadata and fix/enter it if necessary.

  5. Download album art. Press Command-D in Max. It will automatically search. Choose one of the listed images (assuming some are found). [2]

  6. Rip to FLAC using Max (I have set my output directory to $TMP)

  7. $ mkdir $TMP/max-rip
    $ mv $TMP/[Artist]/[Album]/*.flac $TMP/max-rip

  8. Convert $TMP/xAct/*.wav to FLAC using Max.

  9. $ mv $TMP/*.flac [Artist]/[Album]

  10. Ensure that all files in [Artist]/[Album] have the same names as those in max-rip. Assume max-rip as correct - we set the metadata in step 4. The following should identify if there are any differences:

    $ cd $TMP/[Artist]/[Album]
    $ ls -1 ../../max-rip > /tmp/ls.txt; ls -1 | diff /tmp/ls.txt -; rm /tmp/ls.txt

  11. Generate FLAC fingerprints:

    $ metaflac --show-md5sum *.flac > ffp.txt
    $ mv ffp.txt $GIT_REPO/flac fingerprints/[Artist] - [Album].ffp.txt

  12. $ mv $TMP/xAct/ $GIT_REPO/log/[Artist] - [Album]

  13. Transfer tags from the files ripped with Max. I use the following script:
    find . -name "*.flac" | while read FILE
    FIXED=`echo $FILE | sed 's/ /\\ /g'`
    metaflac --export-tags-to=- "../../max-rip/$FIXED" | metaflac --import-tags-from=- "${FILE}"
  14. Transfer album art (if it was found and used in step 5) from the files ripped with Max. I use the following script:
    metaflac --export-picture-to=cover.jpg ../../max-rip/01*
    ls -lh cover.jpg
    echo "Adding album art ..."
    find . -name "*.flac" -exec metaflac --import-picture-from=cover.jpg {} \;
    rm cover.jpg
  15. Move $TMP/[Artist]/[Album] to music library

  16. rm -rf $TMP/xAct $TMP/max-rip

  17. In the root directory of the music library:
    $ md5deep -rl * | sort > $GIT_REPO/md5deep.txt

  18. Check the $GIT_REPO changes and commit

[1] See FLAC Encoding Guide for mac for a details on using xAct.

[2] At this stage I do not know the correct way to handle album art, as the players have a variety of ways of dealing with it. For the moment, I have chosen to store it in the metadata of each file.

Tuesday, July 8, 2008

Music Management

After going to all the trouble of ripping and encoding my CD's to a lossless format, I want to:

  • Ensure integrity of the music library, i.e. at any point be able to validate that all the files exist, their contents haven't changed and that there are no extra files.

  • Have a recovery strategy should there be a problem with the files.

Ideally, I would satisfy these requirements by placing all the music in a Distributed VCS and storing a master copy somewhere like S3. Unfortunately there are a couple of problems:
  • I tried out Git, but after the initial commit of a music file, the repository storage space on the filesystem took up twice the size of the music file. Furthermore, changing metadata such as fixing a spelling mistake in the track name and committing increases the repository by the full size of the file again. I assume this is because the files are binary and already compressed. I didn't try out Mercurial, but I expect it will be the same.

  • The music files are already large, even without the extra overhead of the previous point and the data transfer costs here in Australia are just too high.

My current solution:
  • Store the music library on a removable drive on the Mac at home.

  • Keep a copy of the music library on my computer at work by either periodically taking in the removable drive and using rsync or copying newer music onto a USB drive if physical space is at a premium, such as when cycling.

  • Put checksums of the files in a Git repository stored on both machines. I can then verify the integrity of a music library at any time. Currently I use md5deep because it can recursively process a directory tree and is available for both linux and Mac OS X. The default md5 program on the Mac does not seem to have the same feature set as md5sum on linux.

  • I also store FLAC fingerprints in the Git repository. FLAC files store a checksum of the uncompressed audio in the metadata and various tools, such as xAct on the Mac, can verify the file against that. I am not sure how useful storing the fingerprints is, but I can think of a few unlikely situations where it might be helpful, plus it is small and easy to generate anyway.

To verify a music library, I do:

$ md5deep -rl * | sort | diff $GIT_REPO/md5deep.txt -

where $MUSIC_LIBRARY and $GIT_REPO represent appropriate file paths.

I originally tried the matching feature of md5deep instead:

$ md5deep -rX $GIT_REPO/md5deep.txt *

However this does not catch the case where a file has been deleted in the music library but is still present in the Git checksum file.

Thursday, July 3, 2008

FLAC Players on Mac OS X

iTunes doesn't play FLAC files by default, which is not suprising given Apple has kept ALAC proprietry and FLAC is an open competitor.

Cog is a free, open source audio player that supports FLAC. It works, but is fairly minimalistic. I couldn't find any way to manage playlists in the UI (version 0.07), although the Help indicates it supports M3U and PLS. There is no real concept of a library of music - just the files and folders as organised on the filesystem.

Songbird is a free media player built on top of the Mozilla stack. I think the idea is to access media from the web and your own local media in the one tool. Apparently it has FLAC playback issues. It also doesn't read album artwork from metadata yet.

Fluke is small utility that enables FLAC files to be played in iTunes. The version I tried (0.11) was a little cumbersome to use as you need to import the FLAC files with Fluke, which I found to be a slow process. Secondly, the track number and album art from the FLAC metadata did not show up in iTunes.

Play is a free audio player. It shows your music as a library that can be browsed via different attributes, has playlists and keyboard shortcuts. No cover flow or album artwork though, but for me that is icing, rather than a necessity.

I am trying Play for now and will give Songbird another go once it is more reliable.

Wednesday, July 2, 2008

Scoodi vs Freecycle

Was asked about this recently. Three reasons I prefer Scoodi over Freecycle:

  1. With Freecycle I drown under the volume of email. With Scoodi, I have more options to tailor the way I hear about new listings (finer granularity on my geographical area, email, RSS, etc). I find RSS more useful than email for notifications like this.

  2. Freecycle doesn't work well if there are no takers at the time an item is posted to the mailing list. Scoodi provides a solution for this case. It was designed to be a big searchable/browsable inventory of available stuff near you.

  3. Scoodi has free and 'for sale' stuff on the same site, where as Freecycle is only for free items. I find this convenient, as an item is still being recycled/re-used whether it changed hands for free or if there was money involved.

Lossless Audio Formats

I like to rip my CD's to a lossless format so I have:

  • a backup if they get damaged, lost or stolen

  • the freedom to decide (and change my mind) on the audio quality/size trade-offs. I can listen to the lossless version on devices that have enough space and encode into a lossy format where space is limited.
In the past I have used Apple Lossless (ALAC), because iTunes was available on all the computers I used frequently and it was convenient. However, Apple has not publicly released an encoder, so iTunes is the only choice for encoding to ALAC. I do not wish to be restricted like this (tis a bit Microsoft-ish) as managing a music collection can take a fair bit of effort and I don't know what platforms I will use in the future.

I have started to try out FLAC. It is an open source format that seems to compare well with other lossless options.

Also looking at the ripping process, managing tags and album artwork, converting between formats, etc. There are a lot of issues to solve.

Friday, June 27, 2008

Pattern Matching can be Fragile

Pattern matching in Haskell is so useful, that I have found myself using it extensively. However, in certain cases it has lead to code that doesn't adapt to change particularly well. Consider the following:

data Foo = Bar String Int Int

f (Bar _ x y) = x + y

In this case, pattern matching has provided a very convenient way for the function f to bind names (x and y) to the instances of the two Int elements in the Bar constructor.

The key point though, is that f has encoded the entire structure of the Bar constuctor even though it only operates on a portion of it - the two Int elements.

This means that should the structure of Bar change by removing the String or by adding other elements, the definition of f will need to change, even though the actual computation it performs is unaffected. More generally, if the number of elements in a Data Constructor is changed, all functions that pattern match on that constructor will need to be modified, whether their body is concerned with those elements or not.

Wednesday, June 25, 2008


If you have had Panadeine Forte, Endone and a Morhpine injection and the pain is still so bad that you can't move, then you know you have a problem.

Monday, June 16, 2008

3 Shop Direct Error

I have decided to collect software errors I encounter. Not to necessarily point the finger at specific sites/companies, but to record the evidence I observe about how poor mainstream software development is in general.

This error occurred when I was attempting to view the mobile phones on a specific plan on 3 Shop Direct.

Friday, June 13, 2008


Desparate Housewives essentially functioned as a kind of cognitive heatsink, dissipating thinking that might have otherwise built up and caused society to overheat.

- Clay Shirky
From the video below, which I found here.

Thursday, June 12, 2008

Cross Platform Password Management

Previously I used a MacBook as my main computer and authentication credentials were stored in the native Keychain. I now use a number of trusted machines that are a mixture of Macs and Ubuntu linux and I would like to manage and utilise those credentials from any of the machines.

Many of the credentials are for web applications, so browser integration will be handy. So first of all I am switching from Safari to Firefox on the Macs to have a similar browser on all platforms. I am using a Firefox 3.0 release candidate. Firefox 2.x looks completely out of place on a Mac.

The Firefox Password Manager stores its data in two files (key3.db, signons3.txt) in your profile directory. These files are portable across my machines (I am happy that Firefox isn't integrated with the Apple Keychain), so I manage and share them via Git. By default, anyone with access to the two files can obtain your credentials, so I have also set a master password to take care of that.

To handle the case where I have multiple logins to the same site, I have installed the Secure Login add-on. So far it has worked really well.

For all other credentials I am using KeePassX. It stores its data in a file you specify. The file is portable across my machines and it is also shared via Git.

Wednesday, June 11, 2008

iiNet Customer Service

I am a customer of iiNet for ADSL2+ and VOIP. Up until today I have been happy with the service.

The story starts a couple of months ago. I needed a static IP address temporarily, so to do that iiNet had to upgrade me from a home plan to a business plan (can't get a static IP on a home plan). That all went very smoothly and for an extra $30/month I had a static IP.

Today I phoned iiNet to change back to the home plan, as I no longer require the static IP.

First of all I had to identify myself and I was asked for my username and password. Their hold messages go on about security, yet the first thing they do is ask me for my credentials over an open channel. I refused, so I needed to provide the last two digits of the bank account they direct debit from. So after figuring out which bank I pay this from, I log into internet banking to get the details.

Secondly, I got hit with an unexpected $20 downgrade fee. Might have been nice to know about that back at the beginning of this process.

Thirdly, once they had finished the procedure to revert my plan, I requested to make a complaint. I was informed that since I was no longer a business customer (30 seconds ago they had switched me from business to home) they couldn't accept my complaint as this was the business section, but they would be happy to transfer me to someone in the home section.

After all this, the guy then asks if there is anything else he can 'help' me with and pleasantly wishes me a nice day???

Tuesday, June 10, 2008

JungleDisk on Linux, Take 2

In my first look at JungleDisk on linux, I missed the fact that it uses WebDAV by default. Not only does this expose all your JungleDisk files to anyone that can connect to the WebDAV server on your machine (port 2667 by default), but it also has file metatdata issues (timestamps, permissions).

On linux, JungleDisk can also be mounted as a FUSE filesystem.

In switching from WebDAV to FUSE, I had a few minor issues:

  • Disabling WebDAV (optional) is achieved by setting the local port to 0. I tried through the GUI, but it didn't seem to get persisted. Modifying the jungledisk-settings.ini file directly did the trick.

  • When using the command line only daemon, I needed to use an absolute path for the mountpoint.

  • When using the command line only daemon I couldn't figure out how to see pending operations (without having WebDAV enabled). So I switched back to the GUI version.

Saturday, June 7, 2008

JungleDisk on Mac OS X 10.4.11

By default JungleDisk mounts to /Volumes/JungleDisk with read/write permissions to all users. So any other user on the machine can access it while it is mounted (a possibility if an SSH server is running or Fast User Switching is enabled).

This situation can be slightly improved by stopping the automatic mounting and manually mounting to a more appropriate directory. Assuming other users don't have permissions to access your home directory:

  1. In JungleDiskMonitor -> Preferences -> Jungle Disk Options, change the "Mount volume on startup as:" field to be empty.

  2. Quit JungleDiskMonitor and start it up again. The JungleDisk volume should not be mounted.

  3. $ mkdir $HOME/jungledisk

  4. $ mount_webdav http://localhost:2667/ $HOME/jungledisk
Now only root can get to your mounted JungleDisk. However while the JungleDiskMonitor is running, any user on the system can mount your JungleDisk by issuing a similar command to the last one listed above.

A question about this has been asked in the forums, but I can't see a resolution yet.

Configuration is located at $HOME/Library/Preferences/jungledisk-settings.ini. While the file is in plaintext, the AWS secret key and encryption/decryption keys are actually located in the login keychain, not in the configuration file.

The cache is located at $HOME/Library/Caches/jungledisk/cache. It is not encrypted. This can be protected by using FileVault, although in a multi-user situation (SSH Server of Fast User Switching), once logged in, the home directory is effectively unencrypted, accessible by root and any other user depending on filesystem permissions. Another option is an encrypted disk image, which has the same sort of mult-user issues once it is mounted. I haven't tried encfs on the mac yet.

Friday, June 6, 2008

JungleDisk on Linux

I have installed JungleDisk on my ubuntu linux machine at work and my mac mini at home. I have set it up to use encryption and have been happily accessing (read and write) the same data from both machines (at different times).

Using the default settings on linux, JungleDisk mounts to $HOME/jungledisk and stores its configuration and cache in $HOME/.jungledisk.

When mounted, it seems that no other user can access the jungledisk directory. I tried as root and got a permission denied error. I was pleasantly surprised by this behaviour.

However, any user with sufficient permissions can access the .jungledisk directory. This contains both a local cache and a configuration file named jungledisk-settings.ini. There are two security issues here:

  • The jungledisk-settings.ini file contains both your AWS secret key and your encryption/decryption keys in the clear.

  • The local cache is unencrypted.
The simple solution was to move the .jungledisk directory into an encrypted encfs directory and create a symlink to it. Problem solved.

EDIT: There is a now a follow-up post.

Bubble Rings (Toroidal Vortices)

Last night on the ABC there was a show called Catalyst that had a story on bubble rings, or toroidal vortices. A toroidal vortex occurs when a fluid flows back in on itself to form a donut shape, like the ring around a mushroom cloud. You can see dolphins making and then playing with them in the video located here.

Thursday, June 5, 2008

Secure File Storage on Amazon S3

I decided to use Amazon S3 for hosted file storage. I would like to store files using rsync as well as storing Git repositories. Some of this data will be private, so I would like it encrypted while it is stored on S3, but decrypted from the perspective of the tools on my computer. I am planning to access S3 from a number of trusted computers running either Mac OS X or linux.

I found a few ways to mount my S3 space as a filesystem:

  • PersistentFS. Free FUSE filesystem for linux, not sure about Mac OS X.

  • ElasticDrive. FUSE filesystem for linux, Mac OS X not available yet. Free for 5GB, significant price jumps after that and the price is per OS installation. However it is a block device and seems to do much more than I need.

  • s3fs FuseOverAmazon. Free FUSE filesystem for linux and Mac OS X.

  • JungleDisk. Mac OS X and linux support. Commercial, but costs only $20, which includes lifetime updates and can be installed on multiple machines. Supports encryption using 256 bit AES. Optional service at $1 per month that provides block-level file updates and resuming file uploads.
I considered trying an encryption layer (encfs, TrueCrypt) over s3fs, but decided to give JungleDisk a go with the hope that it would be simpler.

Sunday, June 1, 2008

JAOO Brisbane, Day 2

A highlight of JAAO Brisbane Day 2 was standing around talking with Erik Meijer, Don Syme, Joe Albahari and some Workingmouse guys. The conversation comparing Java with C# and the JVM with the CLR was a bit of fun. Catching up with Dave Thomas was also good.

The program I followed for the day was:

Keynote: Clean Code, Robert C. Martin
The audience generally seemed to enjoy this presentation. I found the main points to lack justification (no the jury is not in on TDD) and the attempted emotional guilt trip by associating the points with professionalism unhelpful.

Building RESTful Services with Erlang and Yaws, Steve Vinoski
Steve talked about Erlang briefly and little more about Yaws. Most of the talk was about REST.

GData, Google, the Cloud, and You, Gregor Hohpe
Two interesting demos. Firstly the Google Mashup Editor and then the Google App Engine.

Language Orientated Programming with F#, Don Syme
There weren't many people in this session, which turned out to be great because Don made it more informal. Most of the session was centered around using F# for parallel and asynchronous tasks. The implementation is done using Monads (but don't tell anyone :-)). Don happily answered questions from people new to functional programming and others comparing with Haskell. He knows his stuff.

Multi-language Programming, Steve Vinoski
I wasn't in much of this session, but while I was there Steve talked about the Blub Paradox, languages suited for XML processing (Scala got a mention) and Concurrency. The problem of shared mutable state got raised again (amazing how often it pops up) and then Steve talked about Erlang using similar material to his previous session.

Overall, I enjoyed the conference and am very appreciative that Dave, Trifork, the speakers and everyone else involved brought JAOO to Brisbane.

Friday, May 30, 2008

JAOO Brisbane 2008, Day 1

The first Brisbane JAOO finished yesterday. This is the program I attended on Day 1 and some high-level thoughts.

Keynote: Why Functional Programming (Still) Matters, Erik Meijer
Erik argued that it will be critical to reduce side-effects (mutating state, IO, etc) and make the remaining explicit in our programming languages in order to succeed in a software world that has increasingly larger programs, is more connected, concurrent and asynchronous. IMHO this was the most valuable talk of the conference as most developers I meet don't consciously realise the impact of side-effects, as they are so used to them.

Designing for Scalability, Patrick Linskey
Given Erik's talk, it was interesting to observe that dealing with mutable shared state was an underlying theme.

Interaction Techniques Using the Wii Remote, Johnny Chung Lee
This was a fun talk to attend. The interactive whiteboard aspects of Johnny's work look very cool.

Enterprise Patterns, Martin Fowler
I went to this session primarily because I hadn't heard Martin speak before. He is a very strong and confident presenter. So much so, it was like an emotional steamroller. Trying to compel the audience by force of genuine belief, some anecdotal evidence or observations, but with little supporting rational was disappointing. To be fair, I could have given part of this talk about 3 years ago (perhaps not as eloquently) and would have used the same methods.

Introduction to F#, Don Syme
The material Don had to talk about was interesting, although I found it hard to get a feel for the language itself with the one large demo. I think it might have been a bit easier with a series of small demos that illustrate specific points. However, a colleague of mine remarked that it was a pleasant change to see a functional programming talk that didn't involve the fibonacci sequence.

Introduction to Real-time Programming on the Java Platform, David Holmes
I have never done anything close to real-time programming, so I went to this session to learn a bit about it. David really knows his stuff, so some of the content was lost on me. I did learn what priority inversion meant and how it affected the Mars Pathfinder mission.

Languages Panel, Don Syme, Joel Pobar, Wayne Kelly, David Holmes
Four australians, nice.

Keynote: Simplicity in Design, Erik Dornenburg & Martin Fowler
By this time, I was pretty tired. This session started out with a similar style to the Enterprise Patterns talk, so I ducked out early for a bit of a rest before the social event.

Social Event
Dinner was at the Belgian Beer Cafe Brussels. Met some more people and had some interesting discussions which was great.

Tuesday, May 27, 2008

Task Lists in The Milk

I think I now have my tasks well organised in Remember The Milk.

After signing up, there are a number of default task lists, some of which are: Study, Personal and Work. I deleted the Study one immediately and started putting tasks into the other two. It quickly became frustrating to be switching between the two task lists during the day. What I needed was a consolidated view of both lists, so I tried out Smart Lists.

Smart Lists are essentially saved search queries. The query language is quite powerful, so it was easy to setup the consolidated view. Unfortunately though, if I was in that smart list and created a new task, then the new task would be created in the default Inbox task list, not one of the underlying tasks lists (Personal and Work).

After some experimenting, it seems that creating a task in a smart list defined by a simple query over one task list works in the desired manner - the new task is in the underlying task list, not the Inbox. However, once the query becomes more complex (I don't understand the rules), a newly created task is place in the Inbox.

The second issue I had was seeing future dated tasks and current tasks all mixed in together. Ideally I want to work predominantly with one task list/view of current tasks during the day.

So my current approach is to only have one task list for all my tasks (I have called it Tasks, surprisingly enough) and two smart lists. The first smart list is my current work (show tasks with no due date). This smart list is open in a browser tab all day and I can pretty much do all my task operations in it, including creating tasks. The second smart list shows all tasks with a due date.

Each day I need to remember to spend a short block of time dealing with tasks that have a due date for that day, by either removing their due date so they appear in the current work task list or postponing them. Perhaps I need to set up some form of automated reminder on days where dated tasks are due.

This system worked well today.

Saturday, May 24, 2008

Remember The Milk

Earlier this week I signed up for Remember The Milk, a web-based task manager. So far I have been quite happy with it.

Currently my work predominantly consists of many small tasks, many interruptions and adapting to changing priorities on a daily basis. So to handle this situation, I generally view my tasks as a queue and prioritise them by their position in the queue. When I am doing a particular task, often it will decompose into smaller tasks, so I am regularly adding, completing and prioritising tasks. Given the frequency of these activities, I am after a lightweight, efficient solution. Generally I am only interested in the title and priority of a task. Occasionally tasks have a due date, such as paying a bill, but otherwise I do not wish to expend effort in calculating and updating task attributes as my circumstances change.

In the past I have had a MacBook as my work machine and used iCal and Stickies (yes Stickies the bundled Mac app) to manage tasks. All tasks for the current day went on to a desktop sticky note, the rest into iCal. I split the sticky note into two sections: TODO and DONE. During the day I could then add a new task (as a line on the sticky note) to the TODO section, re-order (i.e. prioritise) by moving lines up and down and marking a task as done by cutting the line from TODO and pasting it under DONE. Very low tech, but very fast and low overhead.

On longer running activities, I also found the sticky notes useful to collect ideas and links.

This system has generally worked well, except that it was tied to the MacBook. Sometimes I would work from a different machine and in those cases would end up having sticky notes in different places. Secondly, there was no backup (and no I don't want to use Time Machine).

I have now switched from iCal to Google Calendar. This makes working on different (trusted) computers easy from a calendar perspective. However, Google Calendar does not do task management.

Experience with The Milk so far
It was quick, easy and free to sign up. The keyboard shortcuts are excellent, they greatly reduce the time it takes to create, prioritise and complete tasks. Generally the site is quite responsive, although not as fast as using my Stickies approach. The help is quite good.

Most of my tasks are not that private or sensitive in nature, but some are a little. The Milk site follows the same convention as the Google Apps, and will stay with HTTPS if you arrive at the site that way. Occasionally I have noticed it has reverted back to HTTP, but haven't figured out the pattern yet.

I setup a notes Task List just for storing ideas for longer running activities. You can add text snippets (called notes) to a task. So the tasks in this list are not tasks as such, but containers for the notes. The text box for editing notes is a little small, it would be great if it could be bigger, resisable or a little more accessible.

I haven't tried the Google Calendar integration yet and will do further posts once I have settled in to a good way of working with The Milk on a day-to-day basis.

So far the overall verdict is good and I can now get to my tasks from different computers. It is great to see some more Australian (the Milk crew are in Sydney) software success. Well done guys.

Thursday, May 22, 2008

Setting up an IP Printer in Windows XP

Yesterday I helped a colleague connect to a network printer directly via its IP address in Windows XP. In the Add Printer Wizard, we had to select:

Local printer attached to this computer

Does that seem just plain wrong (i.e. contradictory) to anyone else?

Wednesday, May 21, 2008

Haskell and Performance

I have just read Haskell and Performance via Planet Haskell. I haven't been following the recent discussions on the mailing lists that Neil refers to, but I did spend a good year or so working part-time on a pet project I had in Haskell.

Some Haskell libraries are poorly optimised
At one stage I wanted to use bitwise operations for performance. I tried Data.Bits, but that didn't help a great deal. Then I googled around and found a post about the implementation not being particularly fast.

Haskell's multi-threaded performance is amazing
I did not find this the case at all. I tried forkIO and friends as well as STM. From memory, the STM version was elegant, perhaps even beautiful, but not fast enough. I think the issue had to do with laziness - thunks not being evaluated in the worker threads - which I spent considerable time trying to solve.

Reading the Core is not easy
Some documentation I read suggested reading the core, as well as comments from various people. I tried it, but generally couldn't understand it enough to be of use.

Optimisation without profiling is pointless
I agree with the point that optimising for performance is a waste of time if not necessary. Secondly, using GHC profiling was very helpful.

Final Thoughts
These comments are really only relevant to me and the codebase I was working on and not a generalisation for others. In fairness, when I started my project, I didn't know Haskell or functional programming in general. I had spent the previous years predominantly in Java. However I spent over a year working part-time on my project and in that time it went through many changes and a complete re-write.

For a portion of the project, performance was critical (in the end it was running on about 14 cores over 5 machines). I expended a significant percentage of my time on the project battling with optimising the Haskell code. I tried strictness annotations in the places where strictness would seem to be useful, arrays, unboxing, etc. In the end, I could not build a deterministic mental model of how applying these techniques would affect the program at runtime, especially when using GHC with -02.

To solve the problem, I rewrote the performance critical portion in Java (I would have used Scala, but did not wish to spend the time getting up to speed with it at that point). My fingers got sore typing the amount of Java code necessary to do even the simplest of things in Haskell (eg. partial application). When finished, the Java version ran many times faster than the Haskell one (I wish I could remember by how much) and furthermore, I could actually reason about the impact on performance when making changes to the Java version.

Hopefully I can fix my gap in knowledge about performance optimisation in Haskell, so I don't have to resort to Java again.

Tuesday, May 6, 2008

Convert and Filter Subversion to Git

The Challenge
I have one large (25GB) Subversion repository that partly has a structure like this:

I wish to convert the docs subtree (including history) into its own Git repository, without the foo directory.

The Solution
One way to achieve this would have been to dump the repository, filter the history, create a new repository, load the filtered history and then convert with git-svnimport.

Instead, I did the following:

1. Convert the docs subtree into a Mercurial repository, excluding the foo directory.
$ hg convert --filemap filemap --config convert.hg.usebranchnames=False file:///path/to/svnrepos/brad/docs docs-hg

filemap is a file in the current directory with only one line in it:
exclude foo

The effect of --config onvert.hg.usebranchnames=False is to import onto the default branch in Mercurial. Without it, a docs branch would have been used and carried over to Git in the subsequent steps. I wish the final Git repository to just have the conventional master branch.

2. Convert the Mercurial repository to Git.
$ mkdir docs
$ cd docs
$ git init

I installed Mercurial via MacPorts, so to get fast-export to work, I needed to use the right Python:
$ export PYTHON="/opt/local/bin/python2.5"
$ /path/to/fast-export/ -A ../authors.txt -r ../docs-hg

The -A ../authors.txt simply maps the Subversion commit username to a normal Git author format. Same as git-svnimport.

$ git checkout master

3. Remove the intermediate Mercurial repository:
$ cd ..
$ rm -rf docs-hg

I did a diff of the docs subdirectories in the Subversion and Git working copies and did a quick check of the history. Looks like it worked successfully.

Monday, May 5, 2008

And the Winner is: Git

I have decided to move from Subversion to a distributed VCS and have been considering Git and Mercurial. I have settled on Git for the following reasons:

  • Git seems more granular. I expect this to provide more flexibility to adapt to different circumstances, but at a greater learning time cost.

  • The way tags are managed in Mercurial (.hgtags) looks a bit odd.

  • The notion Git has of tracking content rather than files is interesting, although I don't understand the ramifications yet.
To be fair, either Mercurial or Git would be suitable for my current needs. Mercurial was initially more attractive as it seemed simpler to get up and going and the Subversion import works better on my existing repository.

There were a couple of interesting posts I found along the way: Experimenting with Git and The Differences Between Mercurial and Git.

Tuesday, April 29, 2008

Recounting the Rationals

Recently I found the paper Recounting the Rationals via Mark Jason Dominus on Planet Haskell.

Mark suggests that the paper is a good first paper to read. Well since I am part way through my first Discrete Maths subject, I thought I would give it a go as my first mathematical paper to have a look at.

I was quite suprised that I could actually understand a fair bit of the paper as I read through it. It was interesting just how many of the concepts from my one Uni subject are brought together in this paper: rational numbers, sequences, trees, proof by contradiction, proof by induction and the notion of relatively prime.

I still need to spend some more time to finish exploring this paper and Brent Yorgey's six-part blog series about it. However a couple observations so far:

  • The sequence of rationals shown in the paper is quite elegant and fascinating.

  • The fact that the sequence can be represented as a tree is even more interesting.

  • Starting with sequences, switching to a tree perspective, doing a number of proofs on the tree and then relating the tree back to the sequences was a pretty cool technique.
So my verdict from the experience, is that it has been a good first paper to read. It is only 3 pages long. I could understand the paper at a high-level to start with and it has been interesting and challenging to explore in detail.

Saturday, April 26, 2008


Mutable stateful objects are the new spaghetti code

From the Clojure Rationale.

Wednesday, April 16, 2008

Pair Programming

In the absence of a generally accepted definition of pair programing, for the purposes of this post I define it as two programers sitting at the same computer engaged in working on the program at hand.

I see pair programming as one form of collaboration. Other common forms of collaboration on a a software project include:

  • two people standing around a whiteboard discussing the same topic

  • in the case where people are sitting at their own desks, asking someone nearby a question about their current work
On a side note, Dave Thomas has talked about having a customer (the person with 'requirements') paired with a developer. This sounds like an interesting idea to explore for some situations. Although I don't think using current, mainstream enterprise technologies (eg. Java, C# based) would be suitable here, due to the time it commonly takes a developer to do something that would be meaningful to the customer.

My expectation of software developers on the teams I am involved with, is that for whatever task they are working on, they will choose to collaborate with their other team members in whichever ways bring the best business value. Some factors that I have considered in such a decision include:
  • complexity/familiarity with the task

  • skills and abilities of the team members

  • knowledge sharing - in a tangible sense this translates to things such as skills transfer between team members and setting up the project to function appropriately in the case of a team member being unavailable
By the previous statements I disagree with extreme programming and the derivatives that mandate pair programming for all committed code (or are increasingly working towards that objective). Of the numerous projects and people I have been involved with over the last five years, I have not seen the case where such a rule would deliver the best value to the customer, or close to it.

To clarify, I am not blatantly for or against pair programming. It has situational benefits and costs, just like the other forms of collaboration.

Finally, there is one factor that is rarely mentioned by others in my discussions on this topic. That is pair programming sets up a more social work environment, which some people really enjoy more than others. This blog post is already long enough, so I won't comment further on this, except that I would like to see people be honest with themselves if this is a motivator for them. Especially if they are trying to convince others about pair programming.

Monday, April 14, 2008

Online Discrete Maths Resources

MIT has some readings for their Mathematics for Computer Science course.

Carnegie Mellon University also also posted lecture notes on a wiki.

Monday, April 7, 2008

What is an agile project?

It seems that I have a different view of what an agile project is, compared to most of the people I talk to about it. Maybe it is because I am involved in business as well being a software developer.

I assume a software project exists as part of meeting some objective. Often it is a business objective, but could also be non-commercial, such as academic. I will call the party (business or academia) that has the objective and that is paying for the project, the customer.

Looking up the dictionary on the Mac I am writing this with, the definition of agile is:

able to move quickly and easily

Applying that to a software project, I see it as a project where the customer has been able to quickly and easily move in their attempt to achieve their objective.[1]

The interesting twist on all of this, is that at the beginning of a project, the customer usually doesn't know precisely what they want to have at the end of the project. I can think of a few reasons for this:

  1. When design/programming starts, some of the requirements are ambiguous and/or logically inconsistent

  2. There are known problems yet to be solved

  3. For some reason, after work starts, but before it is finished, the customer decides that the set of requirements is different

The standard 'Agile' way of dealing with these issues is to accept them and then have the customer heavily involved throughout the project, clarifying ambiguities, making cost/benefit decisions when presented with alternative solutions, planning and re-planning. (i.e. scope and cost are more transparent and can be pro-actively managed in a holistic manner by the appropriate party)

Now, a project may use an existing Agile Methodology to define the process they will follow. Developers may use particular 'Agile' development techniques that they think will support the project. However, if both of these occur on a project with the above issues, but the active customer involvement is missing, then by the previous definitions the project is not agile.

Unfortunately some of the 'Agile' projects I have been involved with and most of the 'Agile' projects I talk with people about are in this category.

[1] Quickly and easily is relative to the actual complexity of their objective

Thursday, March 27, 2008

Speed vs Quality

It seems that there were some issues with voting machines in the US recently. Paul Johnson proposed a reason why.

Too often projects are run predominantly with a short-term business focus. Lip service is paid to quality, but when things get tight, it is not actually that important. However, all that changes once the project is 'finished'. Suddenly the longer-term becomes more important, but by then it is too late.

In a software project, there is generally a relationship between quality 1 (from the developer perspective) and speed of development (from the business perspective). You can sacrifice quality for speed, but as the project becomes bigger or more complex, it becomes harder to change and so development speed decreases. The short-term gain (probably measured in days/weeks) becomes a long-term loss.

At the other end of the spectrum, you can spend most of your time focussed on quality and not worry about speed of development. As previously posted, this is not commercially sustainable either.

IMHO, one of the most interesting things Agile Methodologies bring to a project, is the opportunity to make development cost/speed and quality more transparent throughout the life of the project. In the end though, someone has to make decisions about quality and speed on a day-to-day basis. It seems to me that a business savvy developer will be more likely to succeed than a tech savvy business person. It is a tough line to walk though.

1 Quality means different things to different people. In this case I am referring to both how well code is written and whether it exhibits correct behaviour.