This makes it easier for the tests (and supervisor) to guarantee to be
able to connect to the server socket.
Also this patch moves freeing the mirror supervisor into the server
thread.
server startup (sparse write avoidance doesn't happen until it is finished).
Added mutex to bitset functions, which were already being called from
multiple threads. Rewrote allocation map builder to request file
information in multiple chunks, to avoid uninterruptible wait and dynamic
memory allocation.
This prevents the supervisor from thinking that the migration completed
successfully.
In order to do this, I've introduced a new lock around the start (and
finish) of the migration so that we avoid a race between the signal
handler in the server_accept loop and the control thread mirror startup.
Without that, we'd risk successfully starting a migration after the
SIGTERM handler fired, which would be Bad.
Building the allocation map takes time, which scales with the size of the disc
being presented. By building that map in the space between bind() and accept(),
we leave the process in a useless state after the only good signal we have for
"we are ready" and the state where it is actually ready. This was breaking
migrations of large files.
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure. If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out. However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.
Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message. This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
Changing behaviour so that instead of rebinding after a successful
migration and continuing as an ordinary server, we simply quit with a
0 exit code and let our caller restart us as a server if they want to.
This means that everything in listen.c, listen.h, and anything making
reference to a rebind address is unneeded.