November 10, 2008

Monoids? In my programming language?
`It's more likely than you think.`

What's a monoid?

A monoid is a simple algebraic structure defined as a set M equipped with an operator * and a neutral element 1 such that the following properties hold:

		associativity   ∀x,y,z ∈ M: x*(y*z) = (x*y)*z
		
		identity        ∀x ∈ M: x*1 = 1*x = x
		
		closure         ∀x,y ∈ M ∃z ∈ M: x*y = z

The operator might (but isn't required to) obey other properties like:

		idempotency   ∀x ∈ M: x*x = x
		
		commutativity ∀x,y ∈ M: x*y = y*x

Nice symbols, but what's the connection?

In programming we often encounter free monoids, where elements of the set M are zero-or-more instances from some other underlying set. Free monoids are often described as strings over some alphabet (where the "alphabet" might consist of characters, integers, or WhizBangObjects).

To see how a data structure can behave like a monoid, let's consider integer lists under the concatenation operation ++. To be a monoid we need a neutral element in the set of lists. Empty lists fit the bill (since concatenating with an empty list is an identity operation). Formally, we can call this the monoid (List<int>, ++, []) or alternatively the free monoid over int values.

		example		[1,2] ++ [3,4] = [1,2,3,4]  
		
		associativity   ∀L₁,L₂,L₃ ∈ List<int>: (L₁++L₂)++L₃ = L₁++(L₂++L₃)
		
		identity        ∀L ∈ List<int>: L++[] = []++L = L
		
		closure         ∀L₁,L₂ ∈ List<int> ∃L₃ ∈ List<int>: L₁++L₂ = L₃

Are lists also idempotent or commutative? It's easy to see that they're not.

		not idempotent  [7] ++ [7] = [7,7]
				[7] ≠ [7,7]
		
		not commutative  [10] ++ [20] = [10,20]
				 [20] ++ [10] = [20,10]
				 [10,20] ≠ [20, 10]

Now, let's dream a little dream and say we magically made the ++ commutative. That is:

		[20] ++ [10] ++ [20] = [20,10,20] 
				     = [10,20,20]  
				     = [20,20,10]!

What an odd situation. Clearly, we're not dealing with lists anymore. In fact, just by twiddling a single algebraic knob we've discovered multisets. If we were feeling particularly nerdy we could drop the word "multiset" altogether and just say "commutative free monoid". This is usually a winning strategy at parties.

Anyhow, let's push this bizarre situation a little further, and assert that ++ should be both commutative and idempotent. Now we get:

		[20] ++ [10] ++ [20] = [20] ++ [20] ++ [10] 
		                     = [20] ++ [10] 
				     = [20,10]

Look ma, it's a set!

The Basics

Datatype	Operator	Neutral Element	Algebraic Properties
List<T>	List<T>.concat	empty list
Multiset<T>	Multiset<T>.union	empty multiset	commutative
Set<T>	Set<T>.union	empty set	commutative, idempotent

Addendum

Some highlights from the reddit discussion:

cgibbard presented a really neat use of monoids to combine comparison operators.

sigfpe wrote:
The most important insight I ever had relating mathematics and computating was when I suddenly realised that when a mathematician talks about 'abstraction' they mean exactly the same thing as a computer scientist. It's a shared interface onto a bunch of different objects.

A monoid is a great example: the interface consists of an object called the identity and a function taking two arguments and returning one. There are then some rules about how these things interact. But the point is that both mathematics and computer science are full of things that 'expose' this interface. Groups, monoids, categories, rings, these are all just interfaces. And what's cool is that for each of these interfaces there are many objects that expose that interface, meaning that if you write code for the interface, the same code is reusable with all of those objects.

AlanCrowe pointed out that I'm writing about something that's been named the Bloom Hierarchy

otto_s argued (convincingly) that functions under composition aren't isomorphic to containers of functions (but rather it's the syntactic terms associated with functions that are). To avoid confusion, I dropped the isomorphism from the table below.

A few more monoids