The idea behind this feature was to avoid the client thread in a listen
server getting stuck forever if the mirroring thread in the source died.
However, it breaks any sane implementation of max_Bps in that thread,
and there are lingering concerns over how it might operate under normal
conditions anyway.
Specifically, if iterating over the bitmap takes a long time, or even just
reading the requisite 8MB from the disc in order to send it, then the
5-second timeout could be hit, causing mirroring to fail unnecessarily.
While the mirror mutex is taken, the mirroring can be abandoned and serve->mirror
set to NULL, so we need to lock around reading information from serve->mirror
Since the vast majority (something like 94% on boot) are sequential small
reads, and since network latency is a major factor in determining how fast the
exposed device appears to the client, it makes sense for us to try to minimise
the number of network requests where we safely can.
This patch implements the simplest possible read cache in flexnbd-proxy. When
it receives a read request, if it's a small request then flexnbd-proxy will
double the length of data requested. On receiving the data from the upstream
server, flexnbd-proxy will return the first half to the downstream as normal,
and stash the second half in a buffer. If the very next request is a read, and
the offset and length match those of what we have stored, that second request
will be satisfied from the buffer without going out over the network.
The cache is invalidated by any non-read request, or by a disconnection.
This makes it easier for the tests (and supervisor) to guarantee to be
able to connect to the server socket.
Also this patch moves freeing the mirror supervisor into the server
thread.
This prevents the supervisor from thinking that the migration completed
successfully.
In order to do this, I've introduced a new lock around the start (and
finish) of the migration so that we avoid a race between the signal
handler in the server_accept loop and the control thread mirror startup.
Without that, we'd risk successfully starting a migration after the
SIGTERM handler fired, which would be Bad.