We're not actually using it in production right now because it doesn't
shut its sockets down cleanly enough. This is a better option than
reverting the functionality or keeping production downgraded until
we sort out a handler that cleanly closes the sockets.
We define a hang as 120 seconds for now; that should be OK (famous last words).
When I say unclean, I mean it; the control socket is left hanging around too.
This is a workaround for the fact that the client can hang the whole server by
sending a write request header specifying > 0 bytes, then uncleanly going away.
On the server side, we acquire the IO mutex, and then try to read > 0 bytes from
the socket; the data never arrives, and when the client reconnects, its requests
never get a response (since we're waiting on that mutex). Getting rid of that
mutex (which isn't actually needed, except for migration) would be better.
In the event of a fiemap ioctl failing (when the file is on a tmpfs,
for instance), we would free() serve->allocation_map, but it would
remain not NULL, leading to segfaults in client.c when responding to
write requests.
Keeping the free() behaviour is more hassle than it's worth, as there
are synchronization problems with setting serve->allocation_map to
NULL, so we just omit the free() instead to avoid the segfault. This
is safe because we never consult the map until allocation_map_built is
set to true, and we never do that when the builder thread fails.
server startup (sparse write avoidance doesn't happen until it is finished).
Added mutex to bitset functions, which were already being called from
multiple threads. Rewrote allocation map builder to request file
information in multiple chunks, to avoid uninterruptible wait and dynamic
memory allocation.
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure. If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out. However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.
Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message. This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
If the client makes a write that's out of range, by the time we get to
validate the message at the server end the client has already stuffed
the socket with data we can't use, so we have to flush it.
This patch also fixes a potential problem in the acceptance tests where
the error field was being returned as an array rather than a value.
Without this, the error you get is a "Bad magic", when the next read
loop tries to read write data as a request. This should be flushed from
the socket (although *when* is an open question), but upping the log
level at least gives us a more informative output.
If the client cuts off part-way through the write, it should cause an
error, not a fatal. Previously this happened if the open file had a
fiemap, but not if there was no allocation map. This patch fixes that,
along with an associated valgrind error.
O_DIRECT causes problems on (at least) a wheezy VM, and there are mixed
reports about its performance impact. This patch makes it a
compile-time choice which should remain until it's been benchmarked.
When we receive a migration, if rebinding to the new listen address and
port fails for a reason which might be fixable, rather than killing the
server we retry once a second. Also in this patch: non-overlapping log
messages and a fix for the client going away halfway through a sendfile
loop.
If the sender disconnects its socket before sending the disconnect
message, the destination should restart the migration process. This
patch makes sure that happens.
Now that we have 3 mutexes lying around, it's important that we check
and free these if necessary if error() is called in any thread that can
hold them. To do this, we now have flexthread.c, which defines a
flexthread_mutex struct. This is a wrapper around a pthread_mutex_t and
a pthread_t. The idea is that in the error handler, the thread can
check whether it holds the mutex and can free it if and only if it does.
This is important because pthread fast mutexes can be freed by *any*
thread, not just the thread which holds them.
Note: it is only ever safe for a thread to check if it holds the mutex
itself. It is *never* safe to check if another thread holds a mutex
without first locking that mutex, which makes the whole operation rather
pointless.
trouble and into predictable cleanup functions (one for each of serve,
client & control contexts). We use 'fatal' to mean 'kill the thread' and
'error' to mean 'don't kill the thread', assuming some recovery action,
except I don't use error anywhere yet.