RFC-0184: POSIX Compatibility for the System Netstack | |
---|---|
Status | Accepted |
Areas |
|
Description | Describes the policy for supporting POSIX-like networking APIs on Fuchsia. |
Issues | |
Gerrit change | |
Authors | |
Reviewers | |
Date submitted (year-month-day) | 2022-07-15 |
Date reviewed (year-month-day) | 2022-08-17 |
Summary
Fuchsia aims to expose a POSIX-compatible networking API to components via
fdio
and the system netstack. It also supports some non-POSIX functionality
that is common across other POSIX-oriented operating systems.
Motivation
Fuchsia's existing system netstack is built around a core that targets Linux compatibility. With a planned replacement of this netstack in the works, questions of compatibility have repeatedly been raised. This proposal puts those to rest by requiring any system netstack to target a POSIX-like API.
The POSIX networking interface describes a standard way for components to access network resources. Supporting the networking subset of POSIX for Fuchsia components makes it easy to 1) reuse existing code on Fuchsia, and 2) write new code for Fuchsia using a familiar API.
Stakeholders
Who has a stake in whether this RFC is accepted? (This section is optional but encouraged.)
Facilitator:
hjfreyer@google.com
Reviewers:
- abarth@google.com (RFC-0082 author)
- brunodalbo@google.com (Netstack)
- dhobsd@google.com (Network Policy)
Consulted:
brunodalbo@google.com, hanjh@google.com, hjfreyer@google.com, martinjeffrey@google.com, nickbrow@google.com, tamird@google.com, wildenhain@google.com
Socialization:
This RFC went through a design review with the Netstack team.
Design
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119.
On a Fuchsia device, the system networking stack provides networking
functionality to components by exposing several fuchsia.posix.socket FIDL
services. While FIDL-aware components can target them, such direct usage is
actively discouraged. Instead, components written for the POSIX system call
API SHOULD link in the fdio
compatibility library to translate libc system
calls into FIDL service calls.
Fuchsia's fdio
library acts as a translation layer for a limited
subset of POSIX to
the appropriate FIDL services. For networking functionality, fdio
provides
implementations of a number of POSIX calls, including socket
,
setsockopt
and getsockopt
, read
,
write
, send
, recv
, and more. Together, the
implementation of the FIDL services layered with fdio
provide complete
implementations of these, and other, calls.
Instead of providing a complete list of system calls, the goal of this document
is to define the criteria used for deciding when to implement POSIX- or
peer-system-defined networking functionality. The expectation is that fdio
and
networking stack will implement system calls and options as-needed, primarily
for first-party Fuchsia code, and then when required by other applications.
POSIX functions that can be implemented by using the fdio
-provided system
calls are out of the scope of this document.
Note that while this proposal mandates that Fuchsia provide a POSIX-compatible networking interface, it does not require its usage. In particular, nothing precludes the future specification or development of a Fuchsia-first API to be implemented alongside POSIX.
Implementation
The existing fdio
library and system networking stack already provide a
mostly-POSIX-compatible interface to components. This proposal is intended to
codify the as-yet informal decision to aim for parity with POSIX-like operating
systems, and to guide future development on the networking stack and fdio
library. When making changes to the system netstack and fdio
, the following
three principles SHOULD be considered:
POSIX compliance
Fuchsia's system netstack and the fdio
library aim to provide compatibility
for the networking API specified by POSIX. Components that target the POSIX
networking API SHOULD work as expected when linked with fdio
and routed the
appropriate socket creation capabilities.
Compatibility with peer systems
The POSIX specification leaves the behavior of some interactions undefined, and so components written against the POSIX interface often expect and account for the behavior of a particular operating system or family of operating systems. Where this behavior is well-defined and consistent across multiple POSIX-like operating systems, Fuchsia's networking subsystem SHOULD match it (except in limited cases, as described below). When the behavior of peer systems is inconsistent, Fuchsia is not guaranteed to match any particular peer system's behavior.
Allowance for divergent behavior
The Fuchsia networking subsystem may need to implement behavior that is different from that of peer systems. Such divergence SHOULD arise when
- The behaviors of peer systems are inconsistent with one another,
- Implementing behavior consistent with a peer system would introduce a security risk, or
- Implementing consistent behavior would be difficult or impossible due to Fuchsia's architectural constraints.
In those cases, the divergence SHOULD be well-motivated, well-documented, and well-tested. Furthermore, the difference in behavior SHOULD be easily observable by components (e.g. a POSIX system call returning an error).
Known Limitations
POSIX makes use of several global identifier spaces, including UIDs, GIDs, PIDs, and file paths. Many of these identifiers are used alongside built-in support for capabilities to limit access to networking operations on POSIX-like systems. These include (but are not limited to):
- Binding sockets on the same address with
SO_REUSEPORT
andSO_REUSEADDR
is restricted to components running with the same UID. - On Linux, the ability to clear the
SO_BINDTODEVICE
socket option is limited to applications running withCAP_NET_RAW
. - On Linux, the ability to create raw IP sockets is limited to applications
running with
CAP_NET_RAW
. - On Linux, binding sockets to low-numbered ports requires an application to
have
CAP_NET_BIND_SERVICE
(though it is an unprivileged operation on recent macOS versions).
Where possible, these behaviors will be supported on Fuchsia, though doing so is
subject to the feasibility of mapping their functionality to Fuchsia concepts,
and may require allowances for Fuchsia's architectural constraints. As an
example, POSIX-like systems implicitly use a process's UID to scope port
sharing permissions. Since Fuchsia doesn't have UIDs, components will need to
take explicit action to opt in to port sharing, likely in the form of additional
calls into fdio
.
Performance
As part of the formalization of support for the POSIX networking interface,
Fuchsia's networking subsystem will provide a high-performance implementation of
the API. The Fuchsia networking stack and fdio
already have significant
benchmarking tooling that exercises the POSIX interface. This will be used to
measure performance improvements and detect regressions.
Ergonomics
By targeting POSIX, a well-known and common interface for applications, Fuchsia makes it easy for developers to port existing code and provides a familiar interface for writing new code. Though some POSIX concepts do not map directly to Fuchsia (e.g. UIDs), the vast majority of networking concepts do. Targeting a familiar interface for networking will significantly improve the experience of developing on and porting code to Fuchsia.
Backwards Compatibility
This proposal does not represent a change of principles, just a codification of informal ones. Since no change is being introduced, considerations for backwards compatibility are minimal.
Security considerations
This RFC doesn't introduce any new security considerations as it codifies an existing set of informal principles. Furthermore, the commitment to providing a POSIX-compatible API does not preclude future per-component isolation or sharding of the networking stack to address security issues.
Privacy considerations
This proposal does not introduce any new privacy considerations as it only codifies the support of already-in-use POSIX APIs.
Testing
The Fuchsia system netstack is tested using an existing compatibility suite that checks conformance against POSIX and Linux (though the latter is a matter of convenience and does not imply an implicit endoresement of Linux behavior). This test suite helps prevent regression and guides future feature development by encoding the expected behavior of the system in response to POSIX calls. Intentional behavior differences between Fuchsia's networking subsystem and POSIX or POSIX-like systems are encoded and documented in the test suite. Known unintentional differences are also encoded and documented, and tagged in Fuchsia's bug-tracking system. This integration-level testing, plus existing unit tests for the internals of the system netstack, provide sufficient coverage for POSIX compatibility.
Documentation
This proposal requires two additional elements of documentation:
- Instructions for how to use the
fdio
API to communicate with the system netstack, and - A list of divergent behaviors between the Fuchsia netstack/
fdio
and POSIX/POSIX-like system behavior.
Drawbacks, alternatives, and unknowns
This proposal requires committing to implementing a significant subset of POSIX
and peer-compatible behavior across the Fuchsia system netstack and fdio
library. Because this proposal is a formalization of existing plans, much of the
functionality already exists. This proposal commits Fuchsia to expanding the
existing API surface, and to supporting it in the long term.
While POSIX is a well-known standard, it was designed without support for capabilities or the rich IPC specification mechanisms that Fuchsia has. Supporting compatibility with POSIX-like systems requires both providing to components a more limited interface (synchronous system calls, untyped file descriptors) and shoehorning those concepts into Fuchsia primitives. In addition, adopting a POSIX-like interface for Fuchsia components may hinder development of a useful Fuchsia-first networking API.
As a more radical option, Fuchsia could explicitly eschew POSIX compatibility in favor of a Fuchsia-first API. Given that a large amount of Fuchsia system service code is already written to target a POSIX-like API, this seems both counterproductive and short-sighted.
Regarding unknowns, the major expected categories are areas of incomplete implementation for POSIX support, and incompatible behavior for Fuchsia relative to peer operating systems. These will need to be addressed and documented as instances arise or are discovered.
Prior art and references
- The POSIX 2017 specification lists the requirements of a POSIX-compatible system.
- RFC-0082 describes Fuchsia's goal of running unmodified Linux programs on Fuchsia.
- The gVisor project's networking code forms the core of the existing Fuchsia system netstack.
Appendix: Implementation decision case study
POSIX's setsockopt
function provides a method for code to set options on a
socket that affect its behavior. POSIX defines several option flags, but
compliant systems are allowed to add their own custom flags. One fairly
commonly-used option is SO_REUSEPORT
option, which, when set, allows bind
ing
sockets on the exact same address and port. Since its semantics are well-defined
and consistent across multiple systems (including FreeBSD, macOS, and other BSD
derivatives), Fuchsia's networking stack allows components to set SO_REUSEPORT
option on UDP sockets.
One of the differences between Linux and BSD-derived implementations of
SO_REUSEPORT
is Linux's requirement that sockets being bound to the same
address belong to processes with the same user ID. Since Fuchsia's architecture
precludes a similar notion of user ID, this constraint is not implemented in
Fuchsia.
Furthermore, Linux's implementation of SO_REUSEPORT
results in inconsistent
behavior depending on whether the option is set on a socket before calling
bind
and then cleared after, or not set at all on the socket. Dependence on
invisible system state and poorly-defined behavior inform a decision not to
emulate Linux's behavior.
More details are available in https://fxbug.dev/42051599.