What's happening in Go tip (2013-08-30)
Welcome back to another week of Go tip. This time we’ll mostly be focusing on smaller changes and API improvements. Enjoy!
What’s happening ¶
In this week’s article we will be looking at:
- Go becoming a better citizen
- Support for encoding GIF images
- User-extensible compression methods for archive/zip
- Easier hashing with MD5 and SHA
go get
& test dependencies- Subrepositories
Go is becoming a better citizen ¶
Relevant CLs: CL 12541052, CL 12650045, CL 13037043, CL 13038043, CL 13348045
Many of the design decisions behind Go would occasionally lead to scenarios where a misbehaving Go program could destabilize the whole operating system. For example, Go’s split stacks could grow indefinitely, until swap death occurs. Another example are goroutines and how they’re being distributed onto system threads. While languages with more explicit threading make it (somewhat) obvious when and how many threads are going to be created, goroutines hide that fact. One particularly dangerous fact is that if a syscall or a call to cgo blocks, it will receive its own thread automatically. This means that thousands of concurrent blocking calls of that kind will lead to thousands of threads. One realistic scenario of this happening would be thousands of concurrent DNS lookups.
These issues have been known and acknowledged for a while, but Go 1.2 will finally address them, by introducing various limitations, some of which are configurable, while others are not.
To tackle infinite recursion, a fixed limit on the size of stacks has been added. For 64-bit systems, that limit is 1 GB and for 32-bit systems it is 250 MB. Do note that the specific limits might still be adjusted if they turn out to be “implausibly large”.
A somewhat related change is CL 12650045, which moves all variables that are bigger than 10 MB to the heap. Admittedly, that change was originally made to avoid an issue in the compiler, but it will also make sure that big allocations won’t hit the stack size limit.
On the topic of limits, there is now a limit on the number of threads that may exist. Exceeding that limit will result
in a crash (as opposed to trying to stay within the limit, which could lead to deadlocks). The limit is set to 10,000
threads by default, but is configurable through runtime/debug.SetMaxThreads()
. This limit addresses the issue of
syscalls or cgo calls unexpectedly leading to too many threads.
A more specific case of limiting threads is that of DNS lookups in the net
package, which has now been limited to a
fixed 500 concurrent lookups. This change has been made in addition to the coalescence of inflight DNS lookups that I
mentioned last week. Not only will these two changes ensure that you won’t bombard your system with new threads, it also
works around the issue that some resolvers cannot handle more than 1024 concurrent lookups.
This set of changes reduces a lot of the “risks” and hidden growth behaviors that Go has, by enforcing upper limits. It
is, however, still important that you, as the programmer, are aware of these limits. While forcefully crashing your
program makes sure that the system stays stable, it’s not exactly a good thing to happen in production systems. You will
still have to avoid infinite recursion for obvious reasons, and you should still keep track of how many potential
threads you are going to create. While the net
package feels responsible for maintaining a hard limit, other packages
do not and it usually is the programmer’s responsibility.
Support for encoding GIF images ¶
Relevant CLs: CL 10977043, CL 10896043, CL 10890045
While it has been possible to decode GIF images for a long time, encoding GIF images wasn’t possible until now.
Go 1.2 will add a GIF encoder, and because encoding GIF images requires quantization, it also adds facilities for
implementing quantizers. For this, two new interfaces have been added to the image/draw
package.
The first interface is Drawer
, which encapsulates the idea of drawing a section of a source image onto a destination
image. A concrete drawer could for example implement Floyd-Steinberg error diffusion. That one is actually included with
Go and used by default by the GIF encoder.
The second interface is the Quantizer
interface that includes the single method Quantize
, whose responsibility will
be to take an image and return a color palette of wanted size. This palette will be used by the GIF encoder to produce a
paletted images. Go does not come with an implementation of Quantizer, though, and the GIF encoder will use the Plan 9
palette by default. Alternatively there’s also the “Web-safe palette”, aka the Netscape Color Cube. Both palettes can
be found in the image/color/palette
package.
And because everybody likes animated GIFs, the new encoder also includes a function for producing those: EncodeAll
.
User-extensible compression methods for archive/zip ¶
Relevant CLs: CL 12421043
CL 12421043 improves archive/zip
to allow adding new compression methods without having to modify archive/zip
directly. While previously a hard-coded switch for known and implemented methods was used (at the time only
DEFLATE and storing without compression), it now allows for registering new
methods (compressors and decompressors separately).
In total the ZIP specification allows 11 different compression methods, out of which Go only supports 2. With this
change, adding new methods will be possible without having to modify archive/zip
directly. It also allows swapping out
implementations for better ones1.
Easier hashing with MD5 and SHA ¶
Relevant CLs: CL 10624044, CL 10571043, CL 10629043, CL 10630043
Like a lot of things in Go that transform streams of bytes, the packages for cryptographic hashes (MD5, SHA1 etc) are
implemented as io.Writer
’s. You create a hash, you write data to it, you then ask for the result. In addition to that,
the hash.Hash
interface has been designed with performance in mind, which is why the Sum()
method takes a slice as
input, to allow buffer reuse. And even though passing nil
is possible, it’s still a somewhat involved process and
often leads to confusion for newbies who are used to easier solutions a la “take these bytes, give me the result”.
The following piece of code demonstrates the current process:
data := []byte(`hello, world`)
h := sha1.New()
h.Write(data)
hash := h.Sum(nil)
fmt.Printf("H(data) = %x\n", hash)
// Output: H(data) = b7e23ec29af22b0b4e41da31e868d57226121c84
And because the Go team agrees that there should be an easier way to calculate hashes, Go 1.2 will add simple package-level functions that take the bytes to hash as input and return the hash. The previous example can be reduced to this:
data := []byte(`hello, world`)
hash := sha1.Sum(data)
fmt.Printf("H(data) = %x\n", hash)
// Output: H(data) = b7e23ec29af22b0b4e41da31e868d57226121c84
Sum()
has been implemented with “no allocations” in mind, which is why it returns an array instead of a slice. Given
the fact that the only difference between the new package-level Sum()
and the old API is the lack of appending to a
slice2, the new function should be just as fast, if not marginally and almost unmeasurably faster.
go get
& test dependencies ¶
Relevant CLs: CL 12566046
CL 12566046 adds the -t
flag to go get
, which will make it download test dependencies, something that wasn’t
possible before. Do note that this will not download test
dependencies recursively, but only for the specified package.
Subrepositories ¶
The Go repository includes a number of subrepositories, most prominently go.tools, which contain some components of Go
that are either not directly part of it or that are being developed independently of Go itself, such as godoc, which has
recently been moved to said subrepository. Andrew Gerrand correctly pointed out that I should be taking a look at these
subrepositories as well. And I will, in due time, as soon as I catch up with the changes on the main repository.
Apparently there have been some nice changes to go vet
that we will be checking out soon!
-
At least as long as
archive/zip
doesn’t include them, because it is not possible to register a compression method that has already been registered. ↩︎ -
Even though
Sum()
will have to create a new digest for every invocation, so will the old API internally, since it operates on a copy of the digest. ↩︎