where the flamingcow roams

Syscall efficiency

System calls have come a long way since the interrupt days. vDSO has moved many simple syscalls (e.g. getpid()) into pure userspace, making them as fast as indirect function calls.

That still doesn't work for I/O, though. Arkanis has some simple measurements that show the difference between getpid() and read(), and it's clear that minimizing read() calls is still worth the effort.

So, how's the world doing on that? Here's h2load, an HTTP/2 load testing client that clearly has a reason for a performance focus, connecting and making 3 requests to a server:

socket(AF_INET6, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 5
setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
connect(5, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "...", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLOUT, {u32=5, u64=4294967301}}) = 0
epoll_wait(4, [{EPOLLOUT, {u32=5, u64=4294967301}}], 64, 59743) = 1
getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
write(5, "\26\3\1\2\0\1\0\1\374\3\3$Xo\3a5:I\275\351Ge;\216b\201\300\312.\202\326"..., 517) = 517
read(5, 0x7fd01ac32ac3, 5)  = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(4, EPOLL_CTL_MOD, 5, {EPOLLIN, {u32=5, u64=8589934597}}) = 0
epoll_wait(4, [{EPOLLIN, {u32=5, u64=8589934597}}], 64, 59743) = 1
read(5, "\26\3\3\0z", 5)    = 5
read(5, "\2\0\0v\3\3\233\317P\30\33f\17\3772\6K\2446\267N\304\313\232\177\364\224\n\334\37\324{"..., 122) = 122
read(5, "\24\3\3\0\1", 5)   = 5
read(5, "\1", 1)            = 1
read(5, "\27\3\3\0 ", 5)    = 5
read(5, "8\315\331bnq\370w\346+j~\310\330\213\30\17\301~9\200ezU:\321\228\6\274\365\212", 32) = 32
read(5, "\27\3\3\n!", 5)    = 5
read(5, "\265\366\242\vqr\241\260\373G\325\370\330\227A\376e\243JY\317\340\226x}Fo\\\364=\306\306"..., 2593) = 2593
read(5, "\27\3\3\1\31", 5)  = 5
read(5, "&\356\321\337\346\240q\317\n\307^\23\362\1\35]D\304gc\334\302\367\340\34C\273\313\316\307yo"..., 281) = 281
read(5, "\27\3\3\0E", 5)    = 5
read(5, "\246m\377@\212@\244\276#vB\211\362\277\23-\17>\324c\237\5\265\265<\370\276\374\20k\353\352"..., 69) = 69
write(5, "\24\3\3\0\1\1\27\3\3\0E)\235\322 \3\247N\216\202\300+\200\25K\177\341\376\300\212\235k"..., 80) = 80
write(5, "\27\3\3\0\200\314Z.\260t2\0\336\315\351L&\256\221\344\222\2453;\201\n\223\305\r\27=\260"..., 133) = 133
epoll_wait(4, [{EPOLLIN, {u32=5, u64=8589934597}}], 64, 59743) = 1
read(5, "\27\3\3\0J", 5)    = 5
read(5, "\325\27U^\346\20\2\320\266h\215\356DV\168\300K+\265\f\316|\355\337h\361\177\242V\213\271"..., 74) = 74
read(5, "\27\3\3\0J", 5)    = 5
read(5, "\0012/\311\351g\270\201\3176\273\245t\230[\257\336j3,\377\365\356\213W\225\301K\304\371\334l"..., 74) = 74
read(5, "\27\3\3\09", 5)    = 5
read(5, "yz,\375\306\0177T-\313M\7\256\376\r\vpY\215\222m\233\16\n\364\345\201\235\302\n\276\255"..., 57) = 57
read(5, "\27\3\3\0\32", 5)  = 5
read(5, "\346^[\215\243iA\324\343\37\200\245\311\270\331S\332\243\311\261\236\325\374\237\v\247", 26) = 26
read(5, 0x7fd01acb91c3, 5)  = -1 EAGAIN (Resource temporarily unavailable)
write(5, "\27\3\3\0\32\377Imr\325\234U\307Ez\253Un\346\276\6P\256\262\214\260\324J\250,\274", 31) = 31
epoll_wait(4, [{EPOLLIN, {u32=5, u64=8589934597}}], 64, 59743) = 1
read(5, "\27\3\3\0s", 5)    = 5
read(5, "\225\322\26\225\266\356\305hL\6\232)_\242_\310P\32H|zL\277\351c?\317\7G\252Z]"..., 115) = 115
read(5, 0x7fd01acb91c3, 5)  = -1 EAGAIN (Resource temporarily unavailable)
write(5, "\27\3\3\0%L\334\214\234\245X}1\213\4\0\3*bky\370:\344\357T\316\360\342\f\243;"..., 42) = 42
epoll_wait(4, [{EPOLLIN, {u32=5, u64=8589934597}}], 64, 59743) = 1
read(5, "\27\3\3\0s", 5)    = 5
read(5, "\351+\3gyf\305.V\272\373\354\240\231UI\222\25oq\244\200\0255c\221\301\t\2m\273\326"..., 115) = 115
read(5, 0x7fd01acb91c3, 5)  = -1 EAGAIN (Resource temporarily unavailable)
write(5, "\27\3\3\0%\300\337r\tR\252O,2\223B\256b\222\271\5\322\265\227\243\327\304G\10\272_3"..., 42) = 42
epoll_wait(4, [{EPOLLIN, {u32=5, u64=8589934597}}], 64, 59743) = 1
read(5, "\27\3\3\0s", 5)    = 5
read(5, "\323D\n\313\323\2\353q\324\210\317\213\30\2550\3738L\200D<\326\214b\366\24\335\"\31\311w\223"..., 115) = 115
read(5, 0x7fd01acb91c3, 5)  = -1 EAGAIN (Resource temporarily unavailable)
write(5, "\27\3\3\0\"\217P\25^*\232\311vCy=u\207;~\"Gf\203h\247\25m&\255\272`"..., 39) = 39
write(5, "\27\3\3\0\23\234SF\221\214\213\7<\336`\316\252\315\307\204\3\342AL", 24) = 24
shutdown(5, SHUT_WR)        = 0
close(5)                    = 0

(╯°□°)╯︵ ┻━┻

If you don't spend a lot of time reading strace output, that may not look odd or bad. It's a complete mess, though. Here's what stands out:

Issues like this frequently creep in with abstractions. Attempts to separate layers of parser logic, or to make the OpenSSL library API sensible, require you to do work in discrete chunks. It's possible to work around much of it with good buffer usage, but it can complicate things.

Want to avoid this in your code? Some suggestions: