NFS sequential read performance

Last modified: Saturday, January 28, 2023

NFS, being a network protocol, with non-negligible latency on the application layer no less, benefits from pipelining, i.e. having multiple outstanding requests. Most applications, however, use simple POSIX APIs for reading files, reading one chunk at a time in a loop. Furthermore, they often use small buffers in the range of 4 kB to 32 kB.

The Linux kernel improves sequential read speeds by employing readahead: reading data in advance of the application requesting it. For NFS mounts, this value was historically 15 times the rsize, but lately it has been changed to a fixed 128 kB. Unfortunately, 128 kB of read-ahead is not nearly enough to reach 10 Gbit/s speeds on NFS. We can increase the read ahead size either by writing to to /sys/class/bdi/<id>/read_ahead_kb (where the id can be found via mountpoint -d <path>), or by using nfsrahead.

On our 10 Gbit/s local network, increasing read ahead to 2560 kB lifted our sequential, bs=32K reads from 300 MB/s to 850 MB/s. Setting read ahead to 1 MiB, same as the rsize, gets us to 940 MB/s.

Excessive readahead can negatively affect semi-random reads because of read amplification. Our main use case is storage of large files that are read sequentially. Other workloads will have to tweak parameters to find the right balance.