What's happening in Go tip (2013-08-23)

Published: Friday, August 23, 2013
Last modified: Sunday, January 15, 2023

Last week I published the first article in an ongoing series on changes in Go tip. It was met with a lot of positive feedback, so here’s the second article. Thank you for your support and I hope you enjoy this article just as much as the previous one.

What’s happening ¶

This time we’ll be looking at:

A new slicing syntax
Performance improvements
Fast, constant-time P-256 elliptic curves
Where has godoc gone?

A new slicing syntax ¶

Relevant CLs: CL 10743046, CL 12931044

The first change we will be looking at is probably the most controversial addition to Go 1.2: A new slicing syntax that allows setting the capacity of a slice. But before going into why it’s controversial let’s see what it’s about first.

Everyone should be familiar with the slicing operation mySlice[i:j], creating a new slice, ranging from i to j. mySlice and the new slice will share the same backing array and manipulating one slice will affect the other. That’s all pretty well known and expected behaviour.

What some people might not realize is that the two slices will share the capacity, too. A slice has a length and a capacity; the number of elements it can see right now and the number of elements in can potentially access. The capacity is mostly relevant when using append()¹: When appending to a slice, it first checks if there’s enough room left in the underlying array (the capacity). If there is, it creates a new slice with bigger length, pointing to the same backing array, the same memory.

Consider the following piece of code:

orig := make([]int, 10, 20)
subSlice := orig[5:10]
fmt.Printf("len(orig) = %d, cap(orig) = %d, len(subSlice) = %d, cap(subSlice) = %d\n",
      len(orig), cap(orig), len(subSlice), cap(subSlice))
// Output: len(orig) = 10, cap(orig) = 20, len(subSlice) = 5, cap(subSlice) = 15

orig = append(orig, 1)
fmt.Println(orig)
// Output: [0 0 0 0 0 0 0 0 0 0 1]

subSlice = append(subSlice, 2)
fmt.Println(orig)
// Output: [0 0 0 0 0 0 0 0 0 0 2]

It demonstrates that subSlice, albeit being sliced to a specific window of orig, still has access to the same kind of memory – minus the first 5 elements, since slices cannot grow backwards. It also demonstrates that using append() leads to “weird” results, where we’d overwrite elements in the original slice.

Now, why is this an issue in real code? Imagine you’re writing your own memory allocator based on []byte – something that’s common when dealing with a lot of tiny allocations that shouldn’t slow down the GC. When the user asks for N bytes of memory, you return a slice that is a slice somewhere from that []byte, with length N. Now the user does something stupid: He appends to it. As we’ve seen before, this append will “leak” into memory beyond the length of the slice, and beyond what the memory allocator intended. You just overwrote someone else’s memory!

A safe alternative that still allows implementing a custom memory allocator but preventing arbitrary memory corruption would be to limit the capacity of the slice we return, so that an append wouldn’t modify memory it shouldn’t. And that safe alternative is the new slicing syntax: mySlice[i:j:k] – The first two elements behave as usual, slicing from i to j. The third element affects the capacity, where the capacity will be calculated as k - i.

k - i as the capacity simply means that k - 1 is the absolute index into the underlying array that will be reachable by the new slice, same as j - 1 being the index for the length. By making k equal to j you create a slice that cannot look past its length. And now we can write an allocator that returns N bytes of memory and doesn’t allow you to look any further.

In the introduction I wrote that this addition is controversial. Maybe you can already guess why: If you’ve kept thinking to yourself “why would I need this”, you guessed why. This feature is used very rarely. Only some very specific use-cases require it, and looking at the standard library won’t return many uses of it, either. That doesn’t mean the feature shouldn’t exist at all, since it does solve a valid problem, but it has been argued that the slicing syntax shouldn’t be extended. Functions like setcap() have been suggested to avoid adding new syntax to Go, but in the end the Go team went with the new slicing syntax. Here’s the full discussion if you’re interested in the back and forth. And yes, this will be in Go 1.2.

There’s also a design document, describing the original motivations and concerns.

Performance improvements & Garbage be gone ¶

Relevant CLs: CL 11573043, CL 9129044, CL 12894043, CL 8179043, CL 9492044, CL 9432046, CL 8819049, CL 9459044, CL 12603049, CL 12708046, CL 10079043, CL 8670044, CL 12680046, CL 12694048, CL 11874043, CL 12072045, CL 9915043, CL 12662043, CL 9462044

Go 1.2 will see a lot of performance improvements. These improvements fall into two categories: Raw higher speed thanks to better code and better compilers, and less work for the memory allocator and garbage collector thanks to less garbage.

Brad Fitzpatrick, in his quest for high performance HTTP, has been doing a lot of work on the net/http package, and related packages you often use with it. Most of his changes fall into the “less garbage” category. Since most changes are rather simple, I’ll only list their CL numbers so you can check them out. They all include benchmarks in their description: CL 9129044 (faster JSON encoding), CL 12894043 (faster ZIP compression), CL 8179043, CL 9492044 and CL 9432046 (several HTTP improvements)

There are, however, some CLs that are particularly interesting. One set of CLs are CL 8819049, CL 9459044 and CL 12603049 – The first two add a buffer pool and buffer reuse to bufio and the last one removes said buffer pool and buffer reuse, replacing it with a simple Reset method. The story behind this is that these buffers would be handed to users, and these could, by ignoring the API, potentially corrupt data of other users, leading to hard to debug issues that are alike to “use after free” errors, which Go explicitly avoids by not providing a manual way of freeing memory. Additionally, it would set a precedent of unnecessarily complex code for the sake of some additional performance.

The newly added Reset method manages to reach similar performance improvements, at the cost of forcing the user to actively reuse buffers instead of relying on the package for it. net/http would be one of these users, making use of Reset in CL 12708046. Do note that the included benchmark is relative to the bufio pool, not Go 1.1.

For the original discussion around this, see issue 6086 by Russ Cox

One last change in Brad’s name is CL 10079043, which adds coalescing of duplicate in-flight DNS lookups. Or put differently: If doing the same DNS lookup a hundred times before one of them finishes, only one actual request will be made to servers.

But not only Brad did some work, other people contributed great improvements as well.

The most notable one would be CL 8670044, which implements an integrated network poller for Windows, leading to 30% performance improvements on that platform for networking operations – A luxury that Go 1.1 already brought for Linux.

Another notable change would be CL 11573043, which considerably speeds up sync.Cond – And completely gets rid of memory allocations in the process.

CL 12680046 and CL 12694048 add fast routes to encoding/binary, handling slices of integer types, making these common types a lot faster and cheaper.

DES encryption has become five times faster (which, sadly, is still very slow) thanks to CL 11874043 and CL 12072045.

bzip2 decompresses 30% faster (CL 9915043) and the DNS client in the net package produces ~350 bytes less of garbage (CL 12662043).

And last but not least, io.Copy now prioritizes WriterTo over ReaderFrom, which can lead to <del>tremendous</del> noticeable performance improvements (and virtually no-ops in the case of ioutil.Discard) when types implement both functions.

Fast, constant-time P-256 elliptic curves ¶

Relevant CLs: CL 10551044

One of my readers pointed out this very interesting change which adds a constant-time P-256 implementation, which at the same time happens to be a lot faster than the previous implementation.

And while performance is nice, the really important bit here is “constant-time”: Since the function takes the same amount of time for all possible inputs, it avoids being vulnerable to timing attacks, a form of side channel attacks in which timing information is used to break a cipher.

Where has godoc gone? ¶

Using Go tip always requires you to be more aware of the development process and to expect breakage, weird behaviour and sudden changes.

Nevertheless, not everyone who uses Go tip has the time to check its entire development history, especially if you’ve been told to “try Go tip and see if your problem has been fixed”. As such, a fair number of people have recently been surprised and confused when, after compiling Go tip, godoc wouldn’t work anymore, either because there was no godoc binary, or because it would print an error like this one:

readTemplate: open /Users/rsc/g/go/lib/godoc/codewalk.html: no such file or directory

In either case, the fix is to run go get code.google.com/p/go.tools/cmd/godoc – godoc has been moved to the go.tools subrepository.

The future ¶

Phew. We’re still catching up with all the development that happened since Go 1.1, but we’re slowly getting there. Luckily current Go tip development has slowed down, not yielding a lot of interesting changes. And where important things change, like the godoc change, we mix them into the articles.

Attentive readers might have noticed that the previous article promised a look at shared linking support. Unfortunately it didn’t make its way into this article, to make room for more important changes and because the change to slicing syntax already took a bit to digest.

What’s planned for next week? More changes!

One can also manually reslice past the length, if the capacity allows it. This is basically what append() is doing. ↩︎