Send me an email!
What's happening in Go tip (2013-08-30)
Welcome back to another week of Go tip. This time we’ll mostly be focusing on smaller changes and API improvements. Enjoy!
What’s happening
In this week’s article we will be looking at:
- Go becoming a better citizen
- Support for encoding GIF images
- User-extensible compression methods for archive/zip
- Easier hashing with MD5 and SHA
go get
& test dependencies- Subrepositories
Go is becoming a better citizen
Relevant CLs: CL 12541052, CL 12650045, CL 13037043, CL 13038043, CL 13348045
Many of the design decisions behind Go would occasionally lead to scenarios where a misbehaving Go program could destabilize the whole operating system. For example, Go’s split stacks could grow indefinitely, until swap death occurs. Another example are goroutines and how they’re being distributed onto system threads. While languages with more explicit threading make it (somewhat) obvious when and how many threads are going to be created, goroutines hide that fact. One particularly dangerous fact is that if a syscall or a call to cgo blocks, it will receive its own thread automatically. This means that thousands of concurrent blocking calls of that kind will lead to thousands of threads. One realistic scenario of this happening would be thousands of concurrent DNS lookups.
These issues have been known and acknowledged for a while, but Go 1.2 will finally address them, by introducing various limitations, some of which are configurable, while others are not.
To tackle infinite recursion, a fixed limit on the size of stacks has been added. For 64-bit systems, that limit is 1 GB and for 32-bit systems it is 250 MB. Do note that the specific limits might still be adjusted if they turn out to be “implausibly large”.
A somewhat related change is CL 12650045, which moves all variables that are bigger than 10 MB to the heap. Admittedly, that change was originally made to avoid an issue in the compiler, but it will also make sure that big allocations won’t hit the stack size limit.
On the topic of limits, there is now
a limit on the number of threads that may exist.
Exceeding that limit will result in a crash (as opposed to trying to
stay within the limit, which could lead to deadlocks). The limit is
set to 10,000 threads by default, but is configurable through
runtime/debug.SetMaxThreads()
. This limit addresses the issue of
syscalls or cgo calls unexpectedly leading to too many threads.
A more specific case of limiting threads is that of DNS lookups in the
net
package, which has now
been limited to a fixed 500 concurrent lookups. This
change has been made in addition to the
coalescence of inflight DNS lookups that I mentioned
last week. Not only will these two changes ensure that you won’t
bombard your system with new threads, it also works around the issue
that some resolvers cannot handle more than 1024 concurrent lookups.
This set of changes reduces a lot of the “risks” and hidden growth
behaviors that Go has, by enforcing upper limits. It is, however,
still important that you, as the programmer, are aware of these
limits. While forcefully crashing your program makes sure that the
system stays stable, it’s not exactly a good thing to happen in
production systems. You will still have to avoid infinite recursion
for obvious reasons, and you should still keep track of how many
potential threads you are going to create. While the net
package
feels responsible for maintaining a hard limit, other packages do not
and it usually is the programmer’s responsibility.
Support for encoding GIF images
Relevant CLs: CL 10977043, CL 10896043, CL 10890045
While it has been possible to decode GIF images for a long time, encoding GIF images wasn’t possible until now.
Go 1.2 will add a GIF encoder, and because encoding GIF images
requires quantization, it also adds facilities for implementing
quantizers. For this, two new interfaces have been added to the
image/draw
package.
The first interface is
Drawer
, which
encapsulates the idea of drawing a section of a source image onto a
destination image. A concrete drawer could for example implement
Floyd-Steinberg error diffusion.
That one is actually included with Go and used by default by the GIF
encoder.
The second interface is the
Quantizer
interface that includes the single method Quantize
, whose
responsibility will be to take an image and return a color palette of
wanted size. This palette will be used by the GIF encoder to produce a
paletted images. Go does not come with an implementation of Quantizer,
though, and the GIF encoder will use the
Plan 9 palette by
default. Alternatively there’s also the
“Web-safe palette”,
aka the Netscape Color Cube. Both palettes can be found in the
image/color/palette
package.
And because everybody likes animated GIFs, the new encoder also
includes a function for producing those:
EncodeAll
.
User-extensible compression methods for archive/zip
Relevant CLs: CL 12421043
CL 12421043 improves archive/zip
to allow adding new compression
methods without having to modify archive/zip
directly. While
previously a hard-coded switch for known and implemented methods was
used (at the time only
DEFLATE and storing without
compression), it now allows for registering new methods (compressors
and decompressors separately).
In total the ZIP specification allows 11 different compression
methods, out of which Go only supports 2. With this change, adding new
methods will be possible without having to modify archive/zip
directly. It
also allows swapping out implementations for better ones¹.
¹: At least as long as archive/zip
doesn’t include them, because it
is not possible to register a compression method that has already been
registered.
Easier hashing with MD5 and SHA
Relevant CLs: CL 10624044, CL 10571043, CL 10629043, CL 10630043
Like a lot of things in Go that transform streams of bytes, the
packages for cryptographic hashes (MD5, SHA1 etc) are implemented as
io.Writer
’s. You create a hash, you write data to it, you then ask
for the result. In addition to that, the hash.Hash
interface has
been designed with performance in mind, which is why the Sum()
method takes a slice as input, to allow buffer reuse. And even though
passing nil
is possible, it’s still a somewhat involved process and
often leads to confusion for newbies who are used to easier solutions
a la “take these bytes, give me the result”.
The following piece of code demonstrates the current process:
data := []byte(`hello, world`)
h := sha1.New()
h.Write(data)
hash := h.Sum(nil)
fmt.Printf("H(data) = %x\n", hash)
// Output: H(data) = b7e23ec29af22b0b4e41da31e868d57226121c84
And because the Go team agrees that there should be an easier way to calculate hashes, Go 1.2 will add simple package-level functions that take the bytes to hash as input and return the hash. The previous example can be reduced to this:
data := []byte(`hello, world`)
hash := sha1.Sum(data)
fmt.Printf("H(data) = %x\n", hash)
// Output: H(data) = b7e23ec29af22b0b4e41da31e868d57226121c84
Please remember that even though the new API is easier to use, you
should only use it when performance is not a concern and you are only
calculating one/a few hashes. Otherwise, using the old interface, with
reused buffers, will prove beneficial.
Sum()
has been implemented with “no allocations” in mind, which is
why it returns an array instead of a slice. Given the fact that the
only difference between the new package-level Sum()
and the old API
is the lack of appending to a slice¹, the new function should be just
as fast, if not marginally and almost unmeasurably faster.
¹: Even though Sum()
will have to create a new digest for every
invocation, so will the old API internally, since it operates on a
copy of the digest.
go get
& test dependencies
Relevant CLs: CL 12566046
CL 12566046 adds the -t
flag to go get
, which will make it
download test dependencies, something
that wasn’t possible before.
Do note that this will not download test dependencies recursively, but
only for the specified package.
Subrepositories
The Go repository includes a number of subrepositories, most
prominently
go.tools,
which contain some components of Go that are either not directly part
of it or that are being developed independently of Go itself, such as
godoc, which has recently been moved to said subrepository.
Andrew Gerrand
correctly pointed out that I should be taking a look at these
subrepositories as well. And I will, in due time, as soon as I catch
up with the changes on the main repository. Apparently there have been
some nice changes to go vet
that we will be checking out soon!