I've been having a fun time learning Go. The final exercise (http://tour.golang.org/#73 at time of writing) asks you to modify a web crawler to crawl pages in parallel without fetching the same page twice.
One general rule of thumb with parallel programming is never write your own synchronisation code; it's difficult, error prone, and difficult to validate. This is true for monitor based synchronisation, and as neat as Go's channels and goroutines are1, they're still relatively low-level so I believe it's generally true for Go as well. Instead of writing your own synchronisation code, use well written libraries that provide a suitable abstraction.
There are two concurrency problems with the WebCrawler exercise. The first is the problem of making the main function wait until all spawned goroutines complete. The second is the problem of storing and accessing visited URLs in a safe manner.
For the first problem, the Go library provides a WaitGroup type in the sync package. For those who are familiar with Doug Lea's excellent java.util.concurrent package, WaitGroup is similar to the java.util.concurrent.Phaser class.
The WaitGroup type simply holds a counter. Clients can increment the counter (using Add(delta int)) or decrement the counter (using Done()). Clients can also call WaitGroup.Wait(), which blocks until the counter reaches zero.
We want to have the main routine wait for an unknown number of goroutines, so WaitGroup is perfect for our needs. Before spawning a goroutine, we simply increment the counter, and in the goroutine itself, we use Go's defer feature to ensure that the counter is decremented before the goroutine completes.
For the second problem, we need to have multiple goroutines store visited URLs, and also query all visited URLs in a safe manner. A 'test-and-set' type operation is ideal for this, so I created a Visited type with a TestAndSetVisited(url string) function which uses a sync.Mutex to protect reads and writes to a map. I considered a sync.RWMutex, but when using test-and-set semantics, a mutex is simpler. I suspect that a RWMutex may be more efficient if the URL graph was dense, but it doesn't seem worth the hassle.
Below is the tutorial code, with my modifications highlighted in green.
Update: I have since discovered that fmt.Printf is not thread safe, and can interleave output. Concurrency is hard.
Also, Sonia at the Sonia Codes blog has a nice writeup of a very go-like solution to this exercise. It's 'go-like' in that it explicitly shares memory by communicating. See http://soniacodes.wordpress.com/2011/10/09/a-tour-of-go-69-exercise-web-crawler/ .
1. For an introduction to where the ideas behind Go's channels and goroutines came from, have a look at C. A. R. Hoare's Communicating Sequential Processes.