Building the allocation map takes time, which scales with the size of the disc
being presented. By building that map in the space between bind() and accept(),
we leave the process in a useless state after the only good signal we have for
"we are ready" and the state where it is actually ready. This was breaking
migrations of large files.
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure. If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out. However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.
Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message. This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
If the client makes a write that's out of range, by the time we get to
validate the message at the server end the client has already stuffed
the socket with data we can't use, so we have to flush it.
This patch also fixes a potential problem in the acceptance tests where
the error field was being returned as an array rather than a value.
When we receive a migration, if rebinding to the new listen address and
port fails for a reason which might be fixable, rather than killing the
server we retry once a second. Also in this patch: non-overlapping log
messages and a fix for the client going away halfway through a sendfile
loop.
If the sender disconnects its socket before sending the disconnect
message, the destination should restart the migration process. This
patch makes sure that happens.
This adds a test for destination behaviour, in that if a source crashes
after sending an entrust message but before the destination can reply,
the destination must allow the source to reconnect and retry the mirror.