We're not actually using it in production right now because it doesn't
shut its sockets down cleanly enough. This is a better option than
reverting the functionality or keeping production downgraded until
we sort out a handler that cleanly closes the sockets.
It's a little more complicated than that, actually. For the various
states that involve reading from, or writing to, the upstream fd,
if the amount of time spent in that state is > 30 seconds, we reconnect
to the server and resend the request.
we also introduce a 15-second reconnect dampener to keep us from stressing
things unduly. This may need to be decreased, or turned into an exponential
backoff, at some point.
Building with -DPREFETCH is currently broken, I'm sure, but otherwise
this version seems to be feature-complete compared to the previous one,
albeit wordier. Upcoming: cleanups
We define a hang as 120 seconds for now; that should be OK (famous last words).
When I say unclean, I mean it; the control socket is left hanging around too.
This is a workaround for the fact that the client can hang the whole server by
sending a write request header specifying > 0 bytes, then uncleanly going away.
On the server side, we acquire the IO mutex, and then try to read > 0 bytes from
the socket; the data never arrives, and when the client reconnects, its requests
never get a response (since we're waiting on that mutex). Getting rid of that
mutex (which isn't actually needed, except for migration) would be better.
Since the vast majority (something like 94% on boot) are sequential small
reads, and since network latency is a major factor in determining how fast the
exposed device appears to the client, it makes sense for us to try to minimise
the number of network requests where we safely can.
This patch implements the simplest possible read cache in flexnbd-proxy. When
it receives a read request, if it's a small request then flexnbd-proxy will
double the length of data requested. On receiving the data from the upstream
server, flexnbd-proxy will return the first half to the downstream as normal,
and stash the second half in a buffer. If the very next request is a read, and
the offset and length match those of what we have stored, that second request
will be satisfied from the buffer without going out over the network.
The cache is invalidated by any non-read request, or by a disconnection.
It's safe to terminate the proxy at any point in its lifecycle, so
there's no point using signalfd() (and the associated select() +
non-blocking I/O gubbins) in it. We might want to use non-blocking
I/O in the future for other reasons, of course, at which point it
might become sensible to use signalfd() again. For now, this makes
us reliably responsive to TERM,INT and QUIT in a way that we weren't
previously.