Commit Graph

16 Commits

Author SHA1 Message Date
nick
53cbe14556 mirror: lengthen the request timeout to 60 seconds
This is complicated slightly by a need to keep the tests fast, so
we introduce an environment variable that can override the constant
2013-10-30 22:45:12 +00:00
Alex Young
161d2fccf1 Rename serve->has_control to serve->success.
This makes the use of this variable to signal an unexpected SIGTERM
while migrating less confusing.
2012-10-09 17:20:39 +01:00
Alex Young
f3e0d61323 Quit with an error status on SIGTERM during migration
This prevents the supervisor from thinking that the migration completed
successfully.

In order to do this, I've introduced a new lock around the start (and
finish) of the migration so that we avoid a race between the signal
handler in the server_accept loop and the control thread mirror startup.
Without that, we'd risk successfully starting a migration after the
SIGTERM handler fired, which would be Bad.
2012-10-04 14:41:55 +01:00
Alex Young
fd935ce4c9 Simplify the migration handover protocol
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure.  If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out.  However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.

Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message.  This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
2012-07-23 10:22:25 +01:00
Alex Young
4790912750 Remove listen mode
Changing behaviour so that instead of rebinding after a successful
migration and continuing as an ordinary server, we simply quit with a
0 exit code and let our caller restart us as a server if they want to.
This means that everything in listen.c, listen.h, and anything making
reference to a rebind address is unneeded.
2012-07-23 09:48:50 +01:00
Alex Young
314c0c2a2a Added the flexnbd break command to stop mirroring 2012-07-17 16:30:49 +01:00
Alex Young
10b46beeea Retry failed rebind attempts
When we receive a migration, if rebinding to the new listen address and
port fails for a reason which might be fixable, rather than killing the
server we retry once a second.  Also in this patch: non-overlapping log
messages and a fix for the client going away halfway through a sendfile
loop.
2012-07-12 14:14:46 +01:00
Alex Young
71b7708964 Minor tidy 2012-07-12 10:22:31 +01:00
Alex Young
eb90308b6e Handle a failed disconnect correctly
If the sender disconnects its socket before sending the disconnect
message, the destination should restart the migration process.  This
patch makes sure that happens.
2012-07-12 09:39:39 +01:00
Alex Young
84dd052465 Fix a test broken by stdout/stderr reshuffle 2012-07-11 10:12:10 +01:00
Alex Young
f3f017a87d Free all possibly held mutexes in error handlers
Now that we have 3 mutexes lying around, it's important that we check
and free these if necessary if error() is called in any thread that can
hold them.  To do this, we now have flexthread.c, which defines a
flexthread_mutex struct.  This is a wrapper around a pthread_mutex_t and
a pthread_t.  The idea is that in the error handler, the thread can
check whether it holds the mutex and can free it if and only if it does.
This is important because pthread fast mutexes can be freed by *any*
thread, not just the thread which holds them.

Note: it is only ever safe for a thread to check if it holds the mutex
itself.  It is *never* safe to check if another thread holds a mutex
without first locking that mutex, which makes the whole operation rather
pointless.
2012-07-11 09:43:16 +01:00
Alex Young
17fe6d3023 Test that a blocked entrust causes a retry 2012-07-03 18:00:31 +01:00
Alex Young
061512f3dc Test that a write reply with the wrong magic will force a retry 2012-07-03 17:01:39 +01:00
Alex Young
a767d4bc8c Test the source handles a dest crash after write correctly 2012-07-03 14:52:27 +01:00
Alex Young
9e67f228f0 Rename a test class 2012-07-03 13:35:47 +01:00
Alex Young
2283b99834 Split acceptance tests into separate files 2012-07-03 13:33:52 +01:00