Skip to content

v0.7: Networking in POSIX vs. io_uring 💍

Choose a tag to compare

@ashvardanian ashvardanian released this 07 Feb 10:57
· 112 commits to main since this release

To showcase the differences between different IO approaches, this release brings a batch-asynchronous echo server implementation on top of UDP, measuring the packet drop frequency, throughput, and latency for:

  • ASIO
  • POSIX
  • io_uring

The numbers currently look like:

Running build_release/less_slow
Run on (6 X 4000.4 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 2048 KiB (x6)
  L3 Unified 327680 KiB (x1)
Load Average: 0.93, 0.52, 0.47
----------------------------------------------------------------------------------------------------------
Benchmark                                                Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------
rpc_libc/loopback/min_time:2.000/manual_time          5514 us         2298 us          509 bytes_per_second=45.3389Mi/s drop,%=0 items_per_second=46.427k/s max_packet_latency,us=55 mean_batch_latency,us=5.51403k mean_packet_latency,us=21.5392
rpc_uring55/loopback/min_time:2.000/manual_time       1630 us         1591 us         1727 bytes_per_second=153.366Mi/s drop,%=0 items_per_second=157.046k/s max_packet_latency,us=1.822k mean_batch_latency,us=1.63009k mean_packet_latency,us=6.36754
rpc_asio/loopback/min_time:2.000/manual_time         89058 us          878 us           28 bytes_per_second=2.80717Mi/s drop,%=12.9325 items_per_second=2.87454k/s max_packet_latency,us=916 mean_batch_latency,us=89.0576k mean_packet_latency,us=399.553

The current example only uses the most basic io_uring features available with Linux kernel 5.5. In the next iterations (#30), we should extend it with the following functionality:

  • IORING_REGISTER_BUFFERS - since 5.1
  • IORING_RECV_MULTISHOT or io_uring_prep_recvmsg_multishot - since 6.0
  • IORING_OP_SEND_ZC or io_uring_prep_sendmsg_zc - since 6.0
  • IORING_SETUP_SQPOLL - with IORING_FEAT_SQPOLL_NONFIXED after 5.11
  • IORING_SETUP_SUBMIT_ALL - since 5.18
  • IORING_SETUP_COOP_TASKRUN - since 5.19
  • IORING_SETUP_SINGLE_ISSUER - since 6.0

Feel free to join the development 🤗

Minor

  • Add: io_uring variant for kernel 6.0 (ce73aa3)
  • Add: io_uring draft (ec28b57)
  • Add: External route networking (a2a8c9e)
  • Add: POSIX echo implementation (3cce3b9)
  • Add: ASIO "echo" server/client ping-pong (08d3326)

Patch

  • Fix: Depend io_uring compilation on kernel version (70c53f6)
  • Improve: IOSQE_FIXED_FILE for kernel 6.0+ (f7f7693)
  • Improve: ASIO benchmarks (6be216a)
  • Docs: Refactor spell-checks (24706a7)
  • Make: Order spell-checks (1358a69)
  • Docs: Recommend OpenBLAS (035e388)
  • Improve: Avoid std::format in io_uring (1857a82)
  • Fix: ARCH_ENABLE_TAGGED_ADDR needs Linux 6.2+ (3993b0c)
  • Fix: Missing openblas_set_num_threads (3cab87d)
  • Docs: Instal libBLAS (7629609)
  • Improve: SO_ZEROCOPY (fd4c9e2)
  • Improve: Retrofit registering buffers in 5.5 (95de751)
  • Make: RelWithDebInfo flags (2838fd5)
  • Improve: Code styling on Windows (c9238a1)
  • Fix: Avoid in-place increment (2c25b4d)
  • Make: Disable CUDA by default (94879fd)
  • Make: Matching VERSION in CMake (b4dc186)
  • Improve: Detect Linux version (f3e91fa)
  • Improve: physical_cores for Windows refactor (0eb985c)
  • Docs: Future io_uring tasks (53c4ca6)
  • Improve: io_uring optional timeouts (f933582)
  • Make: Revert to default BLAS (b3e13dd)
  • Improve: io_uring server logic (3dfe612)
  • Fix: liburing example (a2a9d6c)
  • Improve: Reuse benchmarking logic (cae4175)
  • Improve: Manual IO timing (fc60bfd)
  • Make: Switch to PkgConfig for liburing (b4e50ad)
  • Fix: Compiling asio example (b041392)
  • Make: Tag dependencies, where possible (8f2e985)
  • Improve: Batching client/server requests (955be1d)
  • Make: liburing & asio deps (7256f98)