A Struct Sockaddr Sequel
Source: Hacker News
Background
One of the many objectives of the Linux Kernel Self‑Protection Project (KSPP), which just completed ten years of work, is to ensure that all array references can be bounds‑checked, even in the case of flexible array members whose size is not known at compile time. A particularly challenging flexible array member in the kernel is not even declared as such. Almost exactly one year ago, LWN looked at the effort to increase safety around the networking subsystem’s heavily used sockaddr structure. One year later, Kees Cook is still looking for a way to bring this work to a close.
The Problem with struct sockaddr
Traditionally, struct sockaddr is defined as:
struct sockaddr {
short sa_family;
char sa_data[14];
};
The sa_data field was large enough to hold a network address in the early 1980s when this structure was first defined for BSD Unix, but it is not large enough now. Consequently, a great deal of code—both in the kernel and in user space—passes around struct sockaddr pointers that actually point to larger structures containing more space for the addresses they need to hold. In other words, sa_data is being treated as a flexible array member, even though it is not declared as one.
The prevalence of struct sockaddr has thrown a spanner into many attempts to better check the uses of array members in structures.
At the end of last year’s episode, much of the kernel had been changed to use struct sockaddr_storage (implemented as struct __kernel_sockaddr_storage), which has a data array large enough to hold any known network address. An attempt was made to change the definition of struct sockaddr to make its sa_data field an explicit flexible array member, but that work ran into a snag:
- Many places in the kernel embed
struct sockaddrwithin another structure. - In most of those cases
sa_datais not treated as a flexible array member. - Developers have freely embedded
struct sockaddranywhere within the containing structure, often not at the end.
If sa_data is redefined as a flexible array member, the compiler no longer knows how large the structure will actually be. Consequently, it cannot lay out a structure containing struct sockaddr, emitting tens of thousands of warnings in a kernel build. Kernel developers would rather face the prospect of an array overflow than a flood of warnings, so this work stalled.
Proposed Solution: struct sockaddr_unsized
One possible solution would be to replace embedded struct sockaddr fields with struct sockaddr_storage, eliminating the flexible array member. However, that would bloat the containing structures with unnecessary memory, making the approach unpopular.
Instead, Cook is working on a patch series that introduces another sockaddr variant:
struct sockaddr_unsized {
__kernel_sa_family_t sa_family; /* address family, AF_xxx */
char sa_data[]; /* flexible address data */
};
Its purpose is to be used in internal network‑subsystem interfaces where the size of sa_data needs to be flexible, but where its actual size is also known. For example, the bind() method in struct proto_ops is defined as:
int (*bind) (struct socket *sock,
struct sockaddr *myaddr,
int sockaddr_len);
The type of myaddr can be changed to struct sockaddr_unsized * since sockaddr_len gives the real size of the sa_data array. Cook’s patch series makes many such replacements, eliminating the use of variably sized sockaddr structures in the networking subsystem. With that done, there are no more uses of struct sockaddr that read beyond the 14‑byte sa_data array. As a result, struct sockaddr can be reverted to its classic, non‑flexible definition, and array bounds checking can be applied to code using that structure.
Outlook
The change is enough to make all of those warnings disappear, which many see as a good stopping point. However, the numerous sockaddr_unsized structures remain potential sources of catastrophic overflows. Once the dust settles, attention will likely turn to implementing bounds checking for those structures.
One possible approach mentioned in the patch set is to eventually add an sa_data_len field, allowing the structure to contain the length of its sa_data array. This would make it easy to document the relationship between the fields with the counted_by() annotation, enabling the compiler to insert bounds checks.
While the ability to write new code in Rust holds promise for reducing memory‑safety bugs in the kernel, the simple fact is that the kernel contains a huge amount of C code that will not disappear anytime soon. Anything that can make that code safer is welcome. The many variations of struct sockaddr may seem silly, but they are part of the process of bringing a bit of safety to an API defined over 40 years ago. Ten years of KSPP have made the kernel safer, but the job is far from done.