What's happening in Go tip (2013-08-30)

Published: Friday, August 30, 2013
Last modified: Sunday, January 15, 2023

Welcome back to another week of Go tip. This time we’ll mostly be focusing on smaller changes and API improvements. Enjoy!

What’s happening ¶

In this week’s article we will be looking at:

Go becoming a better citizen
Support for encoding GIF images
User-extensible compression methods for archive/zip
Easier hashing with MD5 and SHA
go get & test dependencies
Subrepositories

Go is becoming a better citizen ¶

Relevant CLs: CL 12541052, CL 12650045, CL 13037043, CL 13038043, CL 13348045

Many of the design decisions behind Go would occasionally lead to scenarios where a misbehaving Go program could destabilize the whole operating system. For example, Go’s split stacks could grow indefinitely, until swap death occurs. Another example are goroutines and how they’re being distributed onto system threads. While languages with more explicit threading make it (somewhat) obvious when and how many threads are going to be created, goroutines hide that fact. One particularly dangerous fact is that if a syscall or a call to cgo blocks, it will receive its own thread automatically. This means that thousands of concurrent blocking calls of that kind will lead to thousands of threads. One realistic scenario of this happening would be thousands of concurrent DNS lookups.

These issues have been known and acknowledged for a while, but Go 1.2 will finally address them, by introducing various limitations, some of which are configurable, while others are not.

To tackle infinite recursion, a fixed limit on the size of stacks has been added. For 64-bit systems, that limit is 1 GB and for 32-bit systems it is 250 MB. Do note that the specific limits might still be adjusted if they turn out to be “implausibly large”.

A somewhat related change is CL 12650045, which moves all variables that are bigger than 10 MB to the heap. Admittedly, that change was originally made to avoid an issue in the compiler, but it will also make sure that big allocations won’t hit the stack size limit.

On the topic of limits, there is now a limit on the number of threads that may exist. Exceeding that limit will result in a crash (as opposed to trying to stay within the limit, which could lead to deadlocks). The limit is set to 10,000 threads by default, but is configurable through runtime/debug.SetMaxThreads(). This limit addresses the issue of syscalls or cgo calls unexpectedly leading to too many threads.

A more specific case of limiting threads is that of DNS lookups in the net package, which has now been limited to a fixed 500 concurrent lookups. This change has been made in addition to the coalescence of inflight DNS lookups that I mentioned last week. Not only will these two changes ensure that you won’t bombard your system with new threads, it also works around the issue that some resolvers cannot handle more than 1024 concurrent lookups.

This set of changes reduces a lot of the “risks” and hidden growth behaviors that Go has, by enforcing upper limits. It is, however, still important that you, as the programmer, are aware of these limits. While forcefully crashing your program makes sure that the system stays stable, it’s not exactly a good thing to happen in production systems. You will still have to avoid infinite recursion for obvious reasons, and you should still keep track of how many potential threads you are going to create. While the net package feels responsible for maintaining a hard limit, other packages do not and it usually is the programmer’s responsibility.

Support for encoding GIF images ¶

Relevant CLs: CL 10977043, CL 10896043, CL 10890045

While it has been possible to decode GIF images for a long time, encoding GIF images wasn’t possible until now.

Go 1.2 will add a GIF encoder, and because encoding GIF images requires quantization, it also adds facilities for implementing quantizers. For this, two new interfaces have been added to the image/draw package.

The first interface is Drawer, which encapsulates the idea of drawing a section of a source image onto a destination image. A concrete drawer could for example implement Floyd-Steinberg error diffusion. That one is actually included with Go and used by default by the GIF encoder.

The second interface is the Quantizer interface that includes the single method Quantize, whose responsibility will be to take an image and return a color palette of wanted size. This palette will be used by the GIF encoder to produce a paletted images. Go does not come with an implementation of Quantizer, though, and the GIF encoder will use the Plan 9 palette by default. Alternatively there’s also the “Web-safe palette”, aka the Netscape Color Cube. Both palettes can be found in the image/color/palette package.

And because everybody likes animated GIFs, the new encoder also includes a function for producing those: EncodeAll.

User-extensible compression methods for archive/zip ¶

Relevant CLs: CL 12421043

CL 12421043 improves archive/zip to allow adding new compression methods without having to modify archive/zip directly. While previously a hard-coded switch for known and implemented methods was used (at the time only DEFLATE and storing without compression), it now allows for registering new methods (compressors and decompressors separately).

In total the ZIP specification allows 11 different compression methods, out of which Go only supports 2. With this change, adding new methods will be possible without having to modify archive/zip directly. It also allows swapping out implementations for better ones¹.

Easier hashing with MD5 and SHA ¶

Relevant CLs: CL 10624044, CL 10571043, CL 10629043, CL 10630043

Like a lot of things in Go that transform streams of bytes, the packages for cryptographic hashes (MD5, SHA1 etc) are implemented as io.Writer’s. You create a hash, you write data to it, you then ask for the result. In addition to that, the hash.Hash interface has been designed with performance in mind, which is why the Sum() method takes a slice as input, to allow buffer reuse. And even though passing nil is possible, it’s still a somewhat involved process and often leads to confusion for newbies who are used to easier solutions a la “take these bytes, give me the result”.

The following piece of code demonstrates the current process:

data := []byte(`hello, world`)
h := sha1.New()
h.Write(data)
hash := h.Sum(nil)
fmt.Printf("H(data) = %x\n", hash)
// Output: H(data) = b7e23ec29af22b0b4e41da31e868d57226121c84

And because the Go team agrees that there should be an easier way to calculate hashes, Go 1.2 will add simple package-level functions that take the bytes to hash as input and return the hash. The previous example can be reduced to this:

data := []byte(`hello, world`)
hash := sha1.Sum(data)
fmt.Printf("H(data) = %x\n", hash)
// Output: H(data) = b7e23ec29af22b0b4e41da31e868d57226121c84

Sum() has been implemented with “no allocations” in mind, which is why it returns an array instead of a slice. Given the fact that the only difference between the new package-level Sum() and the old API is the lack of appending to a slice², the new function should be just as fast, if not marginally and almost unmeasurably faster.

`go get` & test dependencies ¶

Relevant CLs: CL 12566046

CL 12566046 adds the -t flag to go get, which will make it download test dependencies, something that wasn’t possible before. Do note that this will not download test dependencies recursively, but only for the specified package.

Subrepositories ¶

The Go repository includes a number of subrepositories, most prominently go.tools, which contain some components of Go that are either not directly part of it or that are being developed independently of Go itself, such as godoc, which has recently been moved to said subrepository. Andrew Gerrand correctly pointed out that I should be taking a look at these subrepositories as well. And I will, in due time, as soon as I catch up with the changes on the main repository. Apparently there have been some nice changes to go vet that we will be checking out soon!

At least as long as archive/zip doesn’t include them, because it is not possible to register a compression method that has already been registered. ↩︎
Even though Sum() will have to create a new digest for every invocation, so will the old API internally, since it operates on a copy of the digest. ↩︎