The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure. If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out. However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.
Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message. This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
Changing behaviour so that instead of rebinding after a successful
migration and continuing as an ordinary server, we simply quit with a
0 exit code and let our caller restart us as a server if they want to.
This means that everything in listen.c, listen.h, and anything making
reference to a rebind address is unneeded.
This is important because if we try to rebind after a migration and
someone else is in the way, any clients trying to reconnect to us will
instead be connecting to the squatter.
If a second mirror command is run while the first is still going,
flexnbd needs to prevent the second because we only have one dirty map.
Also, the shutdown becomes Complicated if we allow more than one mirror
at a time.
If the client makes a write that's out of range, by the time we get to
validate the message at the server end the client has already stuffed
the socket with data we can't use, so we have to flush it.
This patch also fixes a potential problem in the acceptance tests where
the error field was being returned as an array rather than a value.
The mirror_super signals the commit state to the control thread via an
mbox, and this mbox is moved to control. It was owned by mirror_super,
but the problem with that is that mirror_super can free the mbox before
the control client has been scheduled to receive the message. If it's
owned by the control object, that can't happen.
If the mirror attempt failed and we were able to report an error to the
user, it makes no sense to attempt a retry. We don't have a way to
abort a mirror attempt yet, so if the user got a setting wrong and it's
failing for that reason, the only recourse they'd have would be to
restart the server.
At the moment, a first-pass failed migration will retry. This is wrong,
it should abort. However, to make that happen the mirror supervisor
needs to know the commit state of the mirror thread. With a self_pipe
mirror commit signal that information wasn't there.
If the mirror attempt connects ok, but is rejected (say, for reporting
the wrong size), the client socket needs to be closed. The destination
end can't close its socket and accept another connection attempt unless
it does.
Without this, the error you get is a "Bad magic", when the next read
loop tries to read write data as a request. This should be flushed from
the socket (although *when* is an open question), but upping the log
level at least gives us a more informative output.
Previously, the --verbose flag was only present in debug builds. Now
it's present whether you define DEBUG or not. What changes is the
amount of information printed to stderr: DEBUG sets the --verbose log
level to 0 (debug), while DEBUG unset sets it to 1 (info). This makes
driving the binary slightly simpler as you don't have to detect whether
it's a debug build by scanning for "--verbose" in the help output.
If the client cuts off part-way through the write, it should cause an
error, not a fatal. Previously this happened if the open file had a
fiemap, but not if there was no allocation map. This patch fixes that,
along with an associated valgrind error.
O_DIRECT causes problems on (at least) a wheezy VM, and there are mixed
reports about its performance impact. This patch makes it a
compile-time choice which should remain until it's been benchmarked.
Previously, the behaviour was to unlink any control socket sat where we
wanted to open ours. This would make us lose control of running servers
if we happened to collide accidentally. With this patch, the new
process will abort() if there is a control socket squatting on the
path we want, and unlink it when it closes.
This means that an unclean shutdown will leave a dangling, unattached
control socket which will block a restart, but that's a better option
than intentionally cutting off running servers.
This simplifies building the log output because it means we don't have
to malloc a buffer to append a newline, and we keep the atomic write
property we're after. It also takes advantage of the C constant string
concatenation which we already require to work to prepend the thread and
pid data.
When we receive a migration, if rebinding to the new listen address and
port fails for a reason which might be fixable, rather than killing the
server we retry once a second. Also in this patch: non-overlapping log
messages and a fix for the client going away halfway through a sendfile
loop.
If the sender disconnects its socket before sending the disconnect
message, the destination should restart the migration process. This
patch makes sure that happens.