Currently, we prevent clients from processing requests by taking
the server I/O lock. This leads to requests hanging for a long
time before being terminated when the migration completes, which
is not ideal. With this change, at the start of the final pass,
existing clients are closed and any new connections will be closed
immediately (so no NBD server handshake will be seen).
This is part of the work required to remove remove the server I/O
lock completely.
We're not actually using it in production right now because it doesn't
shut its sockets down cleanly enough. This is a better option than
reverting the functionality or keeping production downgraded until
we sort out a handler that cleanly closes the sockets.
In the event of a fiemap ioctl failing (when the file is on a tmpfs,
for instance), we would free() serve->allocation_map, but it would
remain not NULL, leading to segfaults in client.c when responding to
write requests.
Keeping the free() behaviour is more hassle than it's worth, as there
are synchronization problems with setting serve->allocation_map to
NULL, so we just omit the free() instead to avoid the segfault. This
is safe because we never consult the map until allocation_map_built is
set to true, and we never do that when the builder thread fails.
This makes it easier for the tests (and supervisor) to guarantee to be
able to connect to the server socket.
Also this patch moves freeing the mirror supervisor into the server
thread.
server startup (sparse write avoidance doesn't happen until it is finished).
Added mutex to bitset functions, which were already being called from
multiple threads. Rewrote allocation map builder to request file
information in multiple chunks, to avoid uninterruptible wait and dynamic
memory allocation.
This prevents the supervisor from thinking that the migration completed
successfully.
In order to do this, I've introduced a new lock around the start (and
finish) of the migration so that we avoid a race between the signal
handler in the server_accept loop and the control thread mirror startup.
Without that, we'd risk successfully starting a migration after the
SIGTERM handler fired, which would be Bad.
Building the allocation map takes time, which scales with the size of the disc
being presented. By building that map in the space between bind() and accept(),
we leave the process in a useless state after the only good signal we have for
"we are ready" and the state where it is actually ready. This was breaking
migrations of large files.
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure. If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out. However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.
Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message. This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
This is important because if we try to rebind after a migration and
someone else is in the way, any clients trying to reconnect to us will
instead be connecting to the squatter.
When we receive a migration, if rebinding to the new listen address and
port fails for a reason which might be fixable, rather than killing the
server we retry once a second. Also in this patch: non-overlapping log
messages and a fix for the client going away halfway through a sendfile
loop.
Now that we have 3 mutexes lying around, it's important that we check
and free these if necessary if error() is called in any thread that can
hold them. To do this, we now have flexthread.c, which defines a
flexthread_mutex struct. This is a wrapper around a pthread_mutex_t and
a pthread_t. The idea is that in the error handler, the thread can
check whether it holds the mutex and can free it if and only if it does.
This is important because pthread fast mutexes can be freed by *any*
thread, not just the thread which holds them.
Note: it is only ever safe for a thread to check if it holds the mutex
itself. It is *never* safe to check if another thread holds a mutex
without first locking that mutex, which makes the whole operation rather
pointless.
First, Leaving off the source address caused a segfault in the
command-sending process because there was no NULL check on the ARGV
entry.
Second, while the migration thread sent a signal to the server to close
on successful completion, it didn't wait until the close actually
happened before releasing the IO lock. This meant that any client
thread waiting on that IO lock could have a read or a write queued up
which could succeed despite the server shutdown. This would have meant
dataloss as the guest would see a successful write to the wrong instance
of the file. This patch adds a noddy serve_wait_for_close() function
which the mirror_runner calls to ensure that any clients will reject
operations they're waiting to complete.
This patch also adds a simple scenario test for migration, and fixes
TempFileWriter#read_original.
trouble and into predictable cleanup functions (one for each of serve,
client & control contexts). We use 'fatal' to mean 'kill the thread' and
'error' to mean 'don't kill the thread', assuming some recovery action,
except I don't use error anywhere yet.