Commit Graph

39 Commits

Author SHA1 Message Date
Alex Young
c3c621f750 Don't free a client which hasn't finished yet. 2012-08-23 17:51:19 +01:00
Alex Young
c5dfe16f35 Don't close the same file descriptor more than once. 2012-08-23 16:01:37 +01:00
Alex Young
33f95e1986 Add the --unlink option to mirror
This deletes the local file before tearing down the mirror connection,
allowing us to avoid an ambiguous recovery situation.
2012-07-23 13:39:27 +01:00
Alex Young
fd935ce4c9 Simplify the migration handover protocol
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure.  If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out.  However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.

Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message.  This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
2012-07-23 10:22:25 +01:00
Alex Young
d0b39cce08 Flush bad write data from the client socket.
If the client makes a write that's out of range, by the time we get to
validate the message at the server end the client has already stuffed
the socket with data we can't use, so we have to flush it.

This patch also fixes a potential problem in the acceptance tests where
the error field was being returned as an array rather than a value.
2012-07-15 23:19:12 +01:00
Alex Young
f9baa95b0f Raise the log level of a write-request-out-of-range
Without this, the error you get is a "Bad magic", when the next read
loop tries to read write data as a request.  This should be flushed from
the socket (although *when* is an open question), but upping the log
level at least gives us a more informative output.
2012-07-14 17:27:13 +01:00
Alex Young
1ce1003d3d Error when reading sent data fails
If the client cuts off part-way through the write, it should cause an
error, not a fatal.  Previously this happened if the open file had a
fiemap, but not if there was no allocation map.  This patch fixes that,
along with an associated valgrind error.
2012-07-14 12:10:12 +01:00
Alex Young
c6e6952def Open files with O_DIRECT dependent on a compile-time DIRECT_IO #define.
O_DIRECT causes problems on (at least) a wheezy VM, and there are mixed
reports about its performance impact.  This patch makes it a
compile-time choice which should remain until it's been benchmarked.
2012-07-14 10:07:58 +01:00
Alex Young
40101e49f3 Silence a vfprintf valgrind error
Turns out that %lld causes valgrind to find an uninitialised variable
problem inside vfprintf.  Avoid it here by s/%lld/%d/.
2012-07-13 11:57:46 +01:00
Alex Young
2e4e592c08 Enable writing after the 2G boundary
This patch fixes a bug in readwrite.c which truncated the 'from' field
in nbd requests.  It was casting them down from an off64_t to an int.
2012-07-12 18:01:10 +01:00
Alex Young
10b46beeea Retry failed rebind attempts
When we receive a migration, if rebinding to the new listen address and
port fails for a reason which might be fixable, rather than killing the
server we retry once a second.  Also in this patch: non-overlapping log
messages and a fix for the client going away halfway through a sendfile
loop.
2012-07-12 14:14:46 +01:00
Alex Young
eb90308b6e Handle a failed disconnect correctly
If the sender disconnects its socket before sending the disconnect
message, the destination should restart the migration process.  This
patch makes sure that happens.
2012-07-12 09:39:39 +01:00
Alex Young
f3f017a87d Free all possibly held mutexes in error handlers
Now that we have 3 mutexes lying around, it's important that we check
and free these if necessary if error() is called in any thread that can
hold them.  To do this, we now have flexthread.c, which defines a
flexthread_mutex struct.  This is a wrapper around a pthread_mutex_t and
a pthread_t.  The idea is that in the error handler, the thread can
check whether it holds the mutex and can free it if and only if it does.
This is important because pthread fast mutexes can be freed by *any*
thread, not just the thread which holds them.

Note: it is only ever safe for a thread to check if it holds the mutex
itself.  It is *never* safe to check if another thread holds a mutex
without first locking that mutex, which makes the whole operation rather
pointless.
2012-07-11 09:43:16 +01:00
Alex Young
d16aebf36e Test that a disconnect after the write request but before the data is an error 2012-07-03 15:25:39 +01:00
Alex Young
c9fdd5a60e Handle ECONNRESET during a read request 2012-06-28 11:46:02 +01:00
Alex Young
94b4fa887c Add mboxes 2012-06-27 15:45:33 +01:00
Alex Young
2078d17053 connect failure scenarios 2012-06-22 10:05:41 +01:00
Alex Young
f37a217cb9 Add listen mode 2012-06-21 18:01:50 +01:00
Alex Young
e21beb1866 Add the REQUEST_ENTRUST nbd request type 2012-06-21 17:12:06 +01:00
Alex Young
4e8a9670e5 Merge 2012-06-21 11:37:18 +01:00
Alex Young
ed3090d6d5 Tweak struct initialisation to squash a valgrind error 2012-06-21 10:29:06 +01:00
Alex Young
c7525f87dc Removed proxying completely and fixed the pthread_join bug revealed in the process 2012-06-12 15:08:07 +01:00
Alex Young
2a71b4e7a4 Fix broken error checking around pthread functions 2012-06-11 16:08:19 +01:00
Alex Young
710d8254d4 Make sure all ifs are braced 2012-06-11 14:34:17 +01:00
Alex Young
25fc0969cf Make the compiler stricter and tidy up code to make the subsequent errors and warnings go away 2012-06-11 13:57:03 +01:00
Matthew Bloch
e8b5fae7ab Merge, just renaming old error macros. 2012-06-09 02:37:23 +01:00
Matthew Bloch
b546539ab8 Rewrote error & log functions to be more general, use longjmp to get out of
trouble and into predictable cleanup functions (one for each of serve,
client & control contexts).  We use 'fatal' to mean 'kill the thread' and
'error' to mean 'don't kill the thread', assuming some recovery action,
except I don't use error anywhere yet.
2012-06-09 02:25:12 +01:00
Alex Young
b7096ef908 Audit client connections on acl update 2012-06-08 18:03:41 +01:00
Alex Young
1cd8f4660f Merge of doom 2012-06-07 14:40:55 +01:00
Alex Young
5930f25034 Use client stop signals for thread stopping 2012-06-07 14:25:30 +01:00
Matthew Bloch
40f0f9fab6 Big bit of debug output in write_not_zeroes (disabled). 2012-06-07 12:28:21 +01:00
Alex Young
a90f84972b Add stop signals to client threads 2012-06-07 11:44:19 +01:00
Matthew Bloch
5710431780 Refactored write_not_zeroes to use struct bitset_mapping instead of
repeating all that code (has not fixed earlier bug yet, but lots of
repetition cut).
2012-06-07 11:17:02 +01:00
Alex Young
cfa9f9c71f Fix the sense of client_serve_request 2012-06-06 14:25:35 +01:00
Alex Young
16001eb9eb Move checking for a closed client out of server_lock_io and into client_serve_request 2012-06-06 13:44:38 +01:00
Alex Young
1b289a0e87 Change io lock and unlock to server error on failure 2012-06-06 13:29:13 +01:00
Alex Young
339e766339 Use self_pipe for close_signal 2012-06-06 12:41:03 +01:00
Alex Young
457987664a Renamed struct client_params to struct client 2012-06-06 11:33:17 +01:00
Alex Young
40279bc9ca Split client-specific code into client.{c,h} 2012-06-06 11:27:52 +01:00