180 Commits
0.0.2 ... 0.1.0

Author SHA1 Message Date
nick
d87af93cec tests: Add a migration test with many clients connecting in two waves 2013-09-24 10:11:40 +01:00
nick
bc50532321 Fix a current compiler warning 2013-09-23 17:15:56 +01:00
nick
22f92c5df0 Fix a potential compiler warning on 32-bit 2013-09-23 17:15:47 +01:00
nick
78fc65c515 bitset: Rename bitset_stream_on/off as bitset_enable/disable_stream 2013-09-23 17:10:14 +01:00
nick
5c1b119f83 serve: Fix calulation of server_mirror_bytes_remaining
Previously, we didn't count the number of bytes represented by events
in the stream; we just counted each pending event as one byte. Whoops.
2013-09-23 17:09:55 +01:00
nick
f4793c7059 bitset: Rename bitset_mapping to bitset 2013-09-23 16:58:40 +01:00
nick
0f0697a0aa serve: Remove an unused (and incorrect, in any case) function 2013-09-23 16:47:32 +01:00
nick
e98c2f2f05 serve: Fix the sense of allow/forbid_new_clients
We need a migration test where more clients connect after the gong
2013-09-23 16:46:43 +01:00
nick
ebe6c4a8ab mirror: Remove dead code. We still rely on all_dirty in one place. 2013-09-23 14:20:05 +01:00
nick
847b2ec9ad status: Remove useless stats 2013-09-23 14:19:49 +01:00
nick
ca9aea0d13 status: Expose migration_seconds_left 2013-09-23 14:09:25 +01:00
nick
0ae249009c serve/mirror: Move some code tracking migration speed into serve
The rationale is that status will need this information
2013-09-23 13:49:01 +01:00
nick
0f2225becf status: Display number of currently connected clients, and whether new clients are allowed
These will be useful for migration status monitoring - replaces "is pass == 7?"
2013-09-23 13:38:19 +01:00
nick
a6c175ed1d serve: Allow number of clients currently being used to be counted 2013-09-23 13:37:13 +01:00
nick
94654419c5 serve: Add a comment clarifying that a behaviour is safe 2013-09-23 10:53:55 +01:00
nick
e161121c7a flexnbd: Remove unused ".INCOMPLETE" file code
The original idea was that we'd create a .incomplete file at the destination
for mirroring, but that code was removed some time ago. This is all dead, now
2013-09-23 10:38:18 +01:00
nick
150e506780 flexnbd: Remove the server I/O lock as it no longer has any consumers 2013-09-23 10:29:06 +01:00
nick
9a3106f946 flexnbd: Remove the server I/O lock from around NBD requests
NBD doesn't actually guarantee what happens if you have two
concurrent writes to overlapping areas of the disc, and this
mutex was causing us a near-deadlock when the TCP connection
died uncleanly, partway through a request. So now we don't
bother. This actually removes the last user of the server I/O
mutex, so we can remove it completely from the codebase in a
future commit.
2013-09-23 10:22:48 +01:00
nick
71036730c4 Fix a warning in a test 2013-09-23 10:17:50 +01:00
nick
6553907972 mirror: Fix mirroring, break status
This removes the concept of 'passes' completely from mirror.c,
although it leaves the relevant bits in mirror.h to keep status from
failing - although its current code is now Wrong. FIXME.

We also now get the previous test passing, meaning mirroring works
again.
2013-09-20 17:08:14 +01:00
nick
9770bbe42b tests: Fix for the previous commit 2013-09-20 16:53:30 +01:00
nick
6ffa10bf89 flexnbd: Make a test a bit stricter 2013-09-20 16:00:56 +01:00
nick
eb80c0d235 mirror: Remove server I/O lock and dirty map
Given our bitset_stream events, we no longer need to worry about
keeping track of the dirty map. This also lets us rip out the
server I/O lock from mirroring.

It's possible that we can remove the lock from client.c as well at
this point, but I need to have a bit more of a think about possible
races
2013-09-19 15:18:30 +01:00
nick
a5c296f948 mirror: Fix a comment 2013-09-18 16:28:05 +01:00
nick
77a66c85a0 serve: Move bitset freeing to after closing the mirror and clients 2013-09-17 17:30:33 +01:00
nick
0172eb1cba flexnbd: Some comments and a minor fix in client.c to do with the event stream 2013-09-13 15:17:15 +01:00
nick
c3a5eb0600 bitset: add bitset_stream_size and bitset_stream_queued_bytes 2013-09-12 16:54:42 +01:00
nick
0a029fbbf5 bitset: Add an event stream implementation
Nothing is using it yet
2013-09-12 12:30:50 +01:00
nick
83426e1c01 tests: Update check_bitset to use new bitset_free() function 2013-09-11 16:09:27 +01:00
nick
86a000c717 bitset: Some whitespace changes 2013-09-11 15:48:19 +01:00
nick
54a41aacdf bitset: Add a bitset_free() function 2013-09-11 14:41:59 +01:00
nick
487bef1f40 flexnbd: Disconnect clients at the start of a mirror last pass
Currently, we prevent clients from processing requests by taking
the server I/O lock. This leads to requests hanging for a long
time before being terminated when the migration completes, which
is not ideal. With this change, at the start of the final pass,
existing clients are closed and any new connections will be closed
immediately (so no NBD server handshake will be seen).

This is part of the work required to remove remove the server I/O
lock completely.
2013-09-10 16:03:26 +01:00
nick
0494295705 mirror: Ensure the mirror client socket is closed after a fail, and before a retry 2013-08-27 15:54:59 +01:00
nick
14fde0f2a1 mirror: Remove overly-verbose log line 2013-08-21 14:41:19 +01:00
nick
e13d1d8fb4 mirror: honour max_bytes_per_second - naive scheme
If we're above max_bytes_per_second once we've finished a transfer
(8MB chunks, worst-case) then we delay the next transfer until
all_dirty_bytes / duration < max_bytes_per_second - checking once
per second.

If this isn't good enough, we can improve it - leaky bucket is one
option. To begin with, though, we'll mostly be using this to set
max_bps to either 0 or 100MB/sec or so. So it should be fine.
2013-08-14 16:24:50 +01:00
nick
efdd613968 listen: Turn off CLIENT_MAX_WAIT_SECS
The idea behind this feature was to avoid the client thread in a listen
server getting stuck forever if the mirroring thread in the source died.
However, it breaks any sane implementation of max_Bps in that thread,
and there are lingering concerns over how it might operate under normal
conditions anyway.

Specifically, if iterating over the bitmap takes a long time, or even just
reading the requisite 8MB from the disc in order to send it, then the
5-second timeout could be hit, causing mirroring to fail unnecessarily.
2013-08-14 16:09:55 +01:00
nick
d0022402ae mirror: Start our timeout watcher from the first, not second, transfer 2013-08-14 15:29:24 +01:00
nick
28fff91af1 flexnbd: add a mirror-speed command to change mirror->max_bytes_per_second
It's not actually honoured yet, and ideally, you'd also be able to set it as
part of the initial setup: "flexnbd mirror ... -m 4G". remote_argv for the
mirror case would need to become x=y z=w format first, though.
2013-08-14 13:33:02 +01:00
nick
385c9027db flexnbd status: display mirror->max_bytes_per_second as mirror_speed_limit 2013-08-14 13:30:25 +01:00
nick
b73081e417 One more fix 2013-08-13 16:22:44 +01:00
nick
cc468b0b17 control/mirror: Use uint64_t and strtoull to get max_Bps into the mirror 2013-08-13 12:30:18 +01:00
nick
7128fcc901 control: Output abandoned mirror state 2013-08-13 12:29:53 +01:00
nick
45355666f7 mirror: And another abandon fix 2013-08-12 16:14:53 +01:00
nick
8a294e5ee0 mirror: fix abandon 2013-08-12 15:54:49 +01:00
nick
c6764b0de1 mirror: abandon signals are now honoured outside of the remote end being readable / writable 2013-08-12 15:30:21 +01:00
nick
41facd2ccf Branch merge 2013-08-09 17:07:06 +01:00
nick
f6456349f7 Backed out changeset e58ff57b5e2d
Slows tests down
2013-08-09 17:06:56 +01:00
nick
9f4fbe782c Branch merge 2013-08-09 17:03:25 +01:00
nick
8c750a5e9d listen: Allow longer gaps between transfers 2013-08-09 17:02:58 +01:00
nick
64702d992d Minor fixes here and there 2013-08-09 17:02:33 +01:00
nick
c2df38c9d3 mirror: Use libev to provide an event loop inside the mirror thread
We're doing this so we can implement bandwidth controls sanely.
2013-08-09 17:02:10 +01:00
nick
754949d43f bitset: Add a bitset_run_count_ex that lets you learn the value of the bits in the run 2013-08-09 16:49:38 +01:00
lupine
1a966ca0be bitset: Prove that bitset operations with len=0 don't underflow 2013-07-26 17:09:21 +01:00
nick
f590f8ed3c status: Add migration_speed ( bytes per second ) and migration_duration( seconds ) to the migration output 2013-07-26 11:50:01 +01:00
nick
bc9ce93648 bitset: squash one more bug 2013-07-25 10:58:50 +01:00
nick
a5870b8e9b Remove a stray debugging statement 2013-07-25 10:14:14 +01:00
nick
bed8959d47 bitset: Fix large runs 2013-07-24 17:42:08 +01:00
nick
5c59a412af flexnbd-proxy: ensure upstream cooldown is applied when read init from upstream fails 2013-07-24 16:01:38 +01:00
nick
253cee5a10 flexnbd: Acknowledge new return type of bitset_run_count 2013-07-24 15:08:29 +01:00
nick
7de22a385e flexnbd: clients should be MADV_RANDOM, rather than MADV_SEQUENTIAL 2013-07-24 14:18:23 +01:00
nick
14db3315ca non-debug builds get -O2 for impressive bitset speedups 2013-07-24 12:34:36 +01:00
nick
efe9eaef7c bitset: A more-efficient bit(set)_run_count 2013-07-24 12:03:24 +01:00
nick
f8fd4e0437 bitset: Actually enable an optimization in bit_set/clear_range
Previously, we were setting bits up to the first byte boundary,
memset()ing to the last byte boundary, then ignoring the memset()
and resetting every single bit up to the last one individually,
from where the first for-loop left off.

This should be *at least* nine times faster.
2013-07-24 11:19:52 +01:00
nick
9a37951aaa bitset: Use uint64_t everywhere to avoid possible integer overflows
Hasn't been a problem in practice, mind.
2013-07-24 10:34:22 +01:00
nick
d18423c153 tests: Fix a couple of compile warnings 2013-07-23 17:22:23 +01:00
nick
1b0fe24529 test: Add some tests for bitset_run_count 2013-07-23 17:13:40 +01:00
nick
5c5636b053 flexnbd mirror: If the final run would be longer than the file size, truncate to file size
This fixes migrations of images that are not exactly divisible by 4096
2013-07-23 11:00:51 +01:00
nick
afe76debf7 flexnbd status: Actually output pass statistics 2013-07-08 14:27:04 +01:00
nick
f4bfc70a4b flexnbd status: Add current pass clean/dirty byte statistics 2013-07-08 13:51:15 +01:00
nick
b29ef6d4de flexnbd status: Avoid a possible NULL dereference reading migration status
While the mirror mutex is taken, the mirroring can be abandoned and serve->mirror
set to NULL, so we need to lock around reading information from serve->mirror
2013-07-08 13:32:14 +01:00
nick
dee0bb27d6 flexnbd status: Add the size of the backing file, in bytes
This will be handy information if you're querying flexnbd for migration
stats, particularly.
2013-07-08 10:11:18 +01:00
nick
f556f298b1 flexnbd status: Add current migration pass to the status output if we're migrating 2013-07-08 09:58:31 +01:00
nick
55b452ebef Fix tests for new killswitch argument 2013-07-03 10:04:08 +01:00
nick
9f34752842 flexnbd: Make the killswitch runtime-selectable
We're not actually using it in production right now because it doesn't
shut its sockets down cleanly enough. This is a better option than
reverting the functionality or keeping production downgraded until
we sort out a handler that cleanly closes the sockets.
2013-07-03 09:56:35 +01:00
nick
81d41f567d proxy: Reduce the reconnect cooldown from 15 seconds to 3.
Exponential backoff would be better, but that's OK
2013-06-20 10:26:34 +01:00
nick
89fd18f6f0 proxy: Add a 30-second timeout for requests in-flight to upstream
It's a little more complicated than that, actually. For the various
states that involve reading from, or writing to, the upstream fd,
if the amount of time spent in that state is > 30 seconds, we reconnect
to the server and resend the request.

we also introduce a 15-second reconnect dampener to keep us from stressing
things unduly. This may need to be decreased, or turned into an exponential
backoff, at some point.
2013-06-19 16:36:19 +01:00
nick
3c56ba0af6 proxy: Fix a comment 2013-06-19 11:27:09 +01:00
nick
2a9884e9e9 proxy: Fix the prefetch code 2013-06-19 11:18:52 +01:00
nick
1afea5c73d proxy: Respect the REQUEST_MASK 2013-06-19 11:18:22 +01:00
nick
62bdad2a6e ioutil: Add a bit more debug output to iobuf_read/write 2013-06-19 11:17:46 +01:00
nick
cd0a1f905f proxy: The minor optimisation bugs if needle is not advanced on iobuf_read() 2013-06-19 11:16:35 +01:00
nick
2156d06368 proxy: DRY up some code 2013-06-18 16:58:39 +01:00
nick
b14bba36ec proxy: Set proxy->upstream_fd before calling proxy_finish_connect_to_upstream
The only thing this affects is a log message
2013-06-18 15:58:38 +01:00
nick
f5c434f21c proxy: Initial move to event-loop proxy model.
Building with -DPREFETCH is currently broken, I'm sure, but otherwise
this version seems to be feature-complete compared to the previous one,
albeit wordier. Upcoming: cleanups
2013-06-18 15:37:39 +01:00
nick
662b9c2d07 readwrite: Expose a couple of points of functionality 2013-06-18 15:36:15 +01:00
nick
197c1131bf tests: Tell us which offset fails 2013-06-18 15:35:24 +01:00
nick
cecf2ebc77 proxy: log details of a request that fails upstream at the warn level 2013-06-07 12:12:12 +01:00
nick
f7e5353355 serve: Add a killswitch that causes the server to uncleanly exit on hang
We define a hang as 120 seconds for now; that should be OK (famous last words).
When I say unclean, I mean it; the control socket is left hanging around too.

This is a workaround for the fact that the client can hang the whole server by
sending a write request header specifying > 0 bytes, then uncleanly going away.
On the server side, we acquire the IO mutex, and then try to read > 0 bytes from
the socket; the data never arrives, and when the client reconnects, its requests
never get a response (since we're waiting on that mutex). Getting rid of that
mutex (which isn't actually needed, except for migration) would be better.
2013-06-06 14:16:20 +01:00
nick
f9fe421472 proxy: Some logging cleanups
New scheme:

Individual requests and extra information about stuff are debug
Lifecycle events are now info.
Problems doing anything are warn.
2013-06-06 12:24:28 +01:00
nick
1b6c10926f docs: Fix the documentation for the loglevel timestamps
We're actually using the system monotonic clock.
2013-06-06 12:23:14 +01:00
nick
24858fcde5 logging: Add a timestamp to the log messages we emit 2013-06-06 11:57:05 +01:00
nick
26c7f1b1c4 mirror: munmap() our range on cleanup 2013-05-30 11:09:24 +01:00
nick
055836c8cb mirror: Don't undo the MADV_SEQUENTIAL hinting over the course of a migration 2013-05-30 11:06:15 +01:00
nick
76cf2dc7b9 mirror: Only say we're unlinking the file if we actually are 2013-05-30 11:05:26 +01:00
nick
a5a7d45355 flexnbd: Add more madvise() hints, both for mirroring out and normal operation.
This is hopefully going to reduce flexnbd rss
2013-05-28 14:16:49 +01:00
Alex Young
e548cc53c8 Formatting fixup 2013-05-01 11:02:46 +01:00
nick
151b739e8d Automated merge with ssh://dev/flexnbd-c 2013-04-30 15:50:09 +01:00
nick
d9b3aab972 flexnbd: Pass MS_INVALIDATE to our msync calls
It's not necessary on Linux, but may be needed elsewhere
2013-04-30 11:04:17 +01:00
Alex Young
574d44f17f Add a trivial read buffer to flexnbd-proxy.
Since the vast majority (something like 94% on boot) are sequential small
reads, and since network latency is a major factor in determining how fast the
exposed device appears to the client, it makes sense for us to try to minimise
the number of network requests where we safely can.

This patch implements the simplest possible read cache in flexnbd-proxy.  When
it receives a read request, if it's a small request then flexnbd-proxy will
double the length of data requested.  On receiving the data from the upstream
server, flexnbd-proxy will return the first half to the downstream as normal,
and stash the second half in a buffer.  If the very next request is a read, and
the offset and length match those of what we have stored, that second request
will be satisfied from the buffer without going out over the network.

The cache is invalidated by any non-read request, or by a disconnection.
2013-04-29 14:50:42 +01:00
nick
33ee19dc5a flexnbd-proxy: Add UNIX socket support for the listen address 2013-04-15 16:52:54 +01:00
nick
4e70db8d7f readwrite.c: Set TCP_NODELAY on our NBD client sockets 2013-04-15 15:13:44 +01:00
nick
6984d3709e flexnbd: Don't bind() unless a bind address is specified 2013-04-09 11:47:32 +01:00
nick
2bb8434128 sockutil: Make sockaddr_address_string conform to its comment 2013-03-19 14:47:50 +00:00
nick
e994b80756 proxy: Switch to blocking I/O with signal handlers to exit.
It's safe to terminate the proxy at any point in its lifecycle, so
there's no point using signalfd() (and the associated select() +
non-blocking I/O gubbins) in it. We might want to use non-blocking
I/O in the future for other reasons, of course, at which point it
might become sensible to use signalfd() again. For now, this makes
us reliably responsive to TERM,INT and QUIT in a way that we weren't
previously.
2013-03-19 14:39:04 +00:00
nick
5257e93cb7 flexnbd: Split the proxy mode out into its own binary.
"flexnbd-proxy ..." should be identical in operation to "flexnbd proxy ..."
2013-03-19 13:13:37 +00:00
nick
21ac3cd0ed proxy: Deal with close() failures (and EINTR errnos) comprehensively 2013-03-15 12:07:16 +00:00
nick
f89352aa28 Add an explanatory comment in sock_try_connect() 2013-02-28 12:14:07 +00:00
nick
1d9f055dc7 Turn a couple of FIXME fatals in readwrite.c into warnings 2013-02-28 12:07:21 +00:00
nick
e659a78855 proxy: Fix the return value of a function to match the comment 2013-02-25 15:53:19 +00:00
nick
78299de299 Dummy commit to get past a merge commit 2013-02-21 13:57:33 +00:00
nick
6842864e74 Automated merge with file:///home/lupine/Development/bigv-repos/flexnbd-c-sockutil 2013-02-15 16:53:18 +00:00
nick
98d8fbeaf0 flexnbd: Add a proxy mode
This lets us proxy connections between NBD clients and servers, resiliently.
2013-02-15 16:52:16 +00:00
nick
9b67d30608 serve: Make some error conditions non-fatal, test them.
We don't want flexnbd serve to fall over and die if the client sends an invalid request.
2013-02-15 16:51:28 +00:00
nick
63f7e3e8d4 Fix some sockutil tests 2013-02-15 16:48:23 +00:00
nick
9826dc6c65 Automated merge with ssh://dev/flexnbd-c 2013-02-15 13:36:15 +00:00
nick
0324d3000d branch merge 2013-02-15 13:35:42 +00:00
nick
91085b87fc flexnbd: Add valgrind suppressions for a bug in glibc-2.11 2013-02-15 13:35:21 +00:00
nick
dfa7e1a21b serve: Don't die horribly in the event of EINTR being returned by select() 2013-02-14 16:38:45 +00:00
nick
8281809f42 flexnbd: Fix sock_try_bind so we don't retry on EADDRINUSE 2013-02-14 16:37:14 +00:00
nick
03bc12dd57 flexnbd read/write: Switch to a non-blocking connect() to allow us to time these out 2013-02-14 16:24:10 +00:00
nick
58c4a9530b Make acceptance tests verbose by default 2013-02-14 11:17:44 +00:00
nick
cb7eed28e7 sockutil: Add some tests for sockaddr_address_string 2013-02-13 15:07:30 +00:00
nick
ac560bd907 serve: Refactor some socket utility code into its own module.
We'll be using this in proxy mode later
2013-02-13 13:43:52 +00:00
nick
0fcbe04f80 flexnbd: Remove some obsolete 'rebind' options
They steal short options that I want for other things
2013-02-13 13:11:20 +00:00
nick
f63be84d80 flexnbd: Add some more information to nbdtypes.h 2013-02-08 17:05:22 +00:00
nick
8c04564645 flexnbd: Avoid a SIGSEGV when the allocation map fails to build.
In the event of a fiemap ioctl failing (when the file is on a tmpfs,
for instance), we would free() serve->allocation_map, but it would
remain not NULL, leading to segfaults in client.c when responding to
write requests.

Keeping the free() behaviour is more hassle than it's worth, as there
are synchronization problems with setting serve->allocation_map to
NULL, so we just omit the free() instead to avoid the segfault. This
is safe because we never consult the map until allocation_map_built is
set to true, and we never do that when the builder thread fails.
2013-02-08 16:17:16 +00:00
nick
ecfd108a53 Introduce socket_nbd_write_hello() and a macro to display errno results nicely 2013-02-08 15:53:27 +00:00
nick
56ce7d35c2 Add a debug message for cases where sendfile() fails 2013-02-06 14:41:49 +00:00
nick
2dd3db95bc Automated merge with file:///home/zander/00-projects/17-bigv/new-trial/src/incoming/flexnbd-c 2013-02-06 12:17:40 +00:00
nick
184a13bc9f Add an all-debug task to the makefile 2013-02-05 13:46:55 +00:00
nick
0b3a71bb03 flexnbd: Allocate the right amount of memory for a struct client 2013-02-05 13:27:48 +00:00
nick
719bd30071 Add a minimal Makefile that lets 'make' and 'make clean' do the Right Thing 2013-02-05 09:44:59 +00:00
nick
1afba29b63 flexnbd: Normalise some variable declarations 2013-02-01 15:20:43 +00:00
nick
7583ffbc4d flexnbd: constantize the quiet log level 2013-02-01 15:06:47 +00:00
Alex Young
f002b8ca1f madvise after mirroring to control the RSS 2012-12-28 11:38:54 +00:00
Alex Young
00d7237f66 Remove an errant debug output from test_happy_path.rb 2012-11-21 09:26:12 +00:00
Alex Young
ed70dacf2f Don't skip parts of a file when calling fiemap
A mis-incremented offset in the fiemap-processing code meant that
non-sparse portions of files were missed.
2012-11-20 17:24:19 +00:00
Alex Young
4f650d85c2 Fix the error message for flexnbd write --help 2012-11-20 15:09:48 +00:00
Alex Young
dcef6d29e5 Allocate the bitset in the foreground thread.
This prevents the possibility of a race in dereferencing it in the
client threads.
2012-10-09 17:54:00 +01:00
Alex Young
22bea81445 Don't open the control socket until after the server socket is bound
This makes it easier for the tests (and supervisor) to guarantee to be
able to connect to the server socket.

Also this patch moves freeing the mirror supervisor into the server
thread.
2012-10-09 17:35:20 +01:00
Alex Young
83eb31aba4 Merge 2012-10-09 17:28:41 +01:00
Alex Young
161d2fccf1 Rename serve->has_control to serve->success.
This makes the use of this variable to signal an unexpected SIGTERM
while migrating less confusing.
2012-10-09 17:20:39 +01:00
mbloch
029ebb5ef4 Fixed build_allocation_map in ioutil.c to correctly traverse fiemaps where
there are more than 1000 extents in a 100MB file chunk.
2012-10-08 18:11:21 +01:00
Alex Young
a039ceffcb Merge 2012-10-08 16:02:37 +01:00
Alex Young
062ecca1fd Backed out changeset c25e7d82e56e
This causes test failures under valgrind, and we don't need the
reordering with a background allocation map builder.
2012-10-08 16:01:25 +01:00
Alex Young
cf62b10adf Nullcheck *before* dereferencing.
Also bracketing, replacing a lost comment, and some variable naming.
2012-10-08 14:54:10 +01:00
Matthew Bloch
a49cf14927 Block allocation map is now built in a separate thread, and does not delay
server startup (sparse write avoidance doesn't happen until it is finished).
Added mutex to bitset functions, which were already being called from
multiple threads.  Rewrote allocation map builder to request file
information in multiple chunks, to avoid uninterruptible wait and dynamic
memory allocation.
2012-10-07 21:55:01 +01:00
Matthew Bloch
7b13964c39 Update Rakefile to support locally-installed libcheck, removed efence, pushed
-l arguments to end of link command line.
2012-10-07 02:09:34 +01:00
Alex Young
1fa8ba82a5 Merge 2012-10-04 14:51:54 +01:00
Alex Young
f3e0d61323 Quit with an error status on SIGTERM during migration
This prevents the supervisor from thinking that the migration completed
successfully.

In order to do this, I've introduced a new lock around the start (and
finish) of the migration so that we avoid a race between the signal
handler in the server_accept loop and the control thread mirror startup.
Without that, we'd risk successfully starting a migration after the
SIGTERM handler fired, which would be Bad.
2012-10-04 14:41:55 +01:00
nick
32cae67a75 flexnbd: Move building the allocation map to before server socket bind()
Building the allocation map takes time, which scales with the size of the disc
being presented. By building that map in the space between bind() and accept(),
we leave the process in a useless state after the only good signal we have for
"we are ready" and the state where it is actually ready. This was breaking
migrations of large files.
2012-09-25 11:47:44 +01:00
nick
ccbfce1075 Whitespace 2012-09-20 13:37:48 +01:00
Alex Young
ddc57e76d1 Remove an unneeded sanity check from the tests 2012-09-13 15:13:20 +01:00
Alex Young
1d9c88d4ca Add the write-during-migration test to the acceptance test run 2012-09-13 14:41:50 +01:00
Alex Young
8b43321ef2 Fix for deadlocks when writing while migrating 2012-09-13 12:21:43 +01:00
nick
13328910c8 Add a test case that tickles a deadlock bug when migrating active source discs 2012-09-12 17:13:33 +01:00
Alex Young
50001cd6e7 Merge 2012-09-12 15:43:15 +01:00
Alex Young
ccf5baa956 Add a -dbg package to the debian build 2012-09-12 15:42:58 +01:00
nick
ee652a2965 Fix some races in the acceptance tests 2012-09-11 16:21:35 +01:00
nick
e724d83bec Ensure fiemap ioctl calls are synchronous. 2012-09-11 15:37:13 +01:00
Alex Young
239136064a Add default empty LDFLAGS 2012-08-24 09:32:33 +01:00
Alex Young
c3c621f750 Don't free a client which hasn't finished yet. 2012-08-23 17:51:19 +01:00
Alex Young
c5dfe16f35 Don't close the same file descriptor more than once. 2012-08-23 16:01:37 +01:00
Alex Young
b1a4db2727 Further merge fail fix
The reversal of the control protocol lines for the mirror command wasn't
complete.
2012-07-24 14:19:53 +01:00
nick
2c0f86c018 Fix a merge fail 2012-07-24 09:21:40 +01:00
Alex Young
53eca40fad Fix tests broken by entrust removal
Missed check_readwrite and check_flexnbd
2012-07-23 15:45:39 +01:00
Alex Young
33f95e1986 Add the --unlink option to mirror
This deletes the local file before tearing down the mirror connection,
allowing us to avoid an ambiguous recovery situation.
2012-07-23 13:39:27 +01:00
Alex Young
fd935ce4c9 Simplify the migration handover protocol
The three-way hand-off has a problem: there's no way to arrange for the
state of the migration to be unambiguous in case of failure.  If the
final "disconnect" message is lost (as in, the destination never
receives it whether it is sent by the sender or not), the destination
has no option but to quit with an error status and let a human sort it
out.  However, at that point we can either arrange to have a .INCOMPLETE
file still on disc or not - and it doesn't matter which we choose, we
can still end up with dataloss by picking a specific calamity to have
befallen the sender.

Given this, it makes sense to fall back to a simpler protocol: just send
all the data, then send a "disconnect" message.  This has the same
downside that we need a human to sort out specific failure cases, but
combined with --unlink before sending "disconnect" (see next patch) it
will always be possible for a human to disambiguate, whether the
destination quit with an error status or not.
2012-07-23 10:22:25 +01:00
Alex Young
f6f4266fd6 Update the README for new listen behaviour
Get rid of references to rebind addresses and update the usage examples.
2012-07-23 10:10:47 +01:00
Alex Young
4790912750 Remove listen mode
Changing behaviour so that instead of rebinding after a successful
migration and continuing as an ordinary server, we simply quit with a
0 exit code and let our caller restart us as a server if they want to.
This means that everything in listen.c, listen.h, and anything making
reference to a rebind address is unneeded.
2012-07-23 09:48:50 +01:00
Alex Young
77f4ac29c6 Include strerror(errno) in stat debug output 2012-07-20 09:51:53 +01:00
Alex Young
b0f1a027c6 Add .INCOMPLETE file marker to flexnbd listen
We drop a marker onto the filesystem to say when we know the image we're
serving is not yet ready.
2012-07-19 17:34:20 +01:00
Alex Young
76bbdb4889 Force gzipping the man page 2012-07-19 17:22:25 +01:00
Alex Young
314c0c2a2a Added the flexnbd break command to stop mirroring 2012-07-17 16:30:49 +01:00
Alex Young
1caa3d4e27 Make an EADDRINUSE on server bind fatal.
This is important because if we try to rebind after a migration and
someone else is in the way, any clients trying to reconnect to us will
instead be connecting to the squatter.
2012-07-16 12:34:39 +01:00
Alex Young
2e20e7197a Add the pid to the status output
This will be needed if we daemonise flexnbd.
2012-07-16 11:50:59 +01:00
Alex Young
8814894874 Test setting an ACL 2012-07-16 11:38:01 +01:00
Alex Young
66ff06fe0e Block a second mirror attempt
If a second mirror command is run while the first is still going,
flexnbd needs to prevent the second because we only have one dirty map.
Also, the shutdown becomes Complicated if we allow more than one mirror
at a time.
2012-07-16 11:21:56 +01:00
Alex Young
db30ea0c48 Better error handling for remotes 2012-07-16 11:04:45 +01:00
Alex Young
9a81af5f8f Added tag 0.0.2 for changeset 99b403167181 2012-07-16 10:49:03 +01:00
75 changed files with 6333 additions and 1951 deletions

10
Makefile Normal file
View File

@@ -0,0 +1,10 @@
#!/usr/bin/make -f
all:
rake build
all-debug:
DEBUG=1 rake build
clean:
rake clean

184
README.proxy.txt Normal file
View File

@@ -0,0 +1,184 @@
FLEXNBD-PROXY(1)
================
:doctype: manpage
NAME
----
flexnbd-proxy - A simple NBD proxy
SYNOPSIS
--------
*flexnbd-proxy* ['OPTIONS']
DESCRIPTION
-----------
flexnbd-proxy is a simple NBD proxy server that implements resilient
connection logic for the client. It connects to an upstream NBD server
and allows a single client to connect to it. All server properties are
proxied to the client, and the client connection is kept alive across
reconnections to the upstream server. If the upstream goes away while
an NBD request is in-flight then the proxy (silently, from the point
of view of the client) reconnects and retransmits the request, before
returning the response to the client.
USAGE
-----
$ flexnbd-proxy --addr <ADDR> [ --port <PORT> ]
--conn-addr <ADDR> --conn-port <PORT> [--bind <ADDR>] [option]*
Proxy requests from an NBD client to an NBD server, resiliently. Only one
client can be connected at a time, and ACLs cannot be applied to the client, as they
can be to clients connecting directly to a flexnbd in serve mode.
On starting up, the proxy will attempt to connect to the server specified by
--conn-addr and --conn-port (from the address specified by --bind, if given). If
it fails, then the process will die with an error exit status.
Assuming a successful connection to the `upstream` server is made, the proxy
will then start listening on the address specified by --addr and --port, waiting
for `downstream` to connect to it (this will be your NBD client). The client
will be given the same hello message as the proxy was given by the server.
When connected, any request the client makes will be read by the proxy and sent
to the server. If the server goes away for any reason, the proxy will remember
the request and regularly (~ every 5 seconds) try to reconnect to the server.
Upon reconnection, the request is sent and a reply is waited for. When a reply
is received, it is sent back to the client.
When the client disconnects, cleanly or otherwise, the proxy goes back to
waiting for a new client to connect. The connection to the server is maintained
at that point, in case it is needed again.
Only one request may be in-flight at a time under the current architecture; that
doesn't seem to slow things down much relative to alternative options, but may
be changed in the future if it becomes an issue.
Options
~~~~~~~
*--addr, -l ADDR*:
The address to listen on. If this begins with a '/', it is assumed to be
a UNIX domain socket to create. Otherwise, it should be an IPv4 or IPv6
address.
*--port, -p PORT*:
The port to listen on, if --addr is not a UNIX socket.
*--conn-addr, -C ADDR*:
The address of the NBD server to connect to. Required.
*--conn-port, -P PORT*:
The port of the NBD server to connect to. Required.
*--help, -h* :
Show command or global help.
*--verbose, -v* :
Output all available log information to STDERR.
*--quiet, -q* :
Output as little log information as possible to STDERR.
LOGGING
-------
Log output is sent to STDERR. If --quiet is set, no output will be seen
unless the program termintes abnormally. If neither --quiet nor
--verbose are set, no output will be seen unless something goes wrong
with a specific request. If --verbose is given, every available log
message will be seen (which, for a debug build, is many). It is not an
error to set both --verbose and --quiet. The last one wins.
The log line format is:
<TIMESTAMP>:<LEVEL>:<PID> <THREAD> <SOURCEFILE>:<SOURCELINE>: <MSG>
*TIMESTAMP*:
Time the log entry was made. This is expressed in terms of monotonic ms
*LEVEL*:
This will be one of 'D', 'I', 'W', 'E', 'F' in increasing order of
severity. If flexnbd is started with the --quiet flag, only 'F' will be
seen. If it is started with the --verbose flag, any from 'I' upwards
will be seen. Only if you have a debug build and start it with
--verbose will you see 'D' entries.
*PID*:
This is the process ID.
*THREAD*:
flexnbd-proxy is currently single-threaded, so this should be the same
for all lines. That may not be the case in the future.
*SOURCEFILE:SOURCELINE*:
Identifies where in the source code this log line can be found.
*MSG*:
A short message describing what's happening, how it's being done, or
if you're very lucky *why* it's going on.
Proxying
~~~~~~~~
The main point of the proxy mode is to allow clients that would otherwise break
when the NBD server goes away (during a migration, for instance) to see a
persistent TCP connection throughout the process, instead of needing its own
reconnection logic.
For maximum reliability, the proxy process would be run on the same machine as
the actual NBD client; an example might look like:
nbd-server-1$ flexnbd serve -l 10.0.0.1 -p 4777 myfile [...]
nbd-client-1$ flexnbd-proxy -l 127.0.0.1 -p 4777 -C 10.0.0.1 -P 4777
nbd-client-1$ nbd-client -c 127.0.0.1 4777 /dev/nbd0
nbd-server-2$ flexnbd listen -l 10.0.0.2 -p 4777 -f myfile [...]
nbd-server-1$ flexnbd mirror --addr 10.0.0.2 -p 4777 [...]
Upon completing the migration, the mirroring and listening flexnbd servers will
both exit. With the proxy mediating requests, this does not break the TCP
connection that nbd-client is holding open. If no requests are in-flight, it
will not notice anything at all; if requests are in-flight, then the reply may
take longer than usual to be returned.
When flexnbd is restarted in serve mode on the second server:
nbd-server-2$ flexnbd serve -l 10.0.0.1 -p 4777 -f myfile [...]
The proxy notices and reconnects, fulfiling any request it has in its buffer.
The data in myfile has been moved between physical servers without the nbd
client process having to be disturbed at all.
BUGS
----
Should be reported to nick@bytemark.co.uk.
Current issues include:
* Only old-style NBD negotiation is supported
* Only one request may be in-flight at a time
* All I/O is blocking, and signals terminate the process immediately
* UNIX socket support is limited to the listen address
* FLUSH and TRIM commands, and the FUA flag, are not supported
* DISCONNECT requests do not get passed through to the NBD server
* No active timeout-retry of requests - we trust the kernel's idea of failure
AUTHOR
------
Written by Alex Young <alex@bytemark.co.uk>.
Original concept and core code by Matthew Bloch <matthew@bytemark.co.uk>.
Proxy mode written by Nick Thomas <nick@bytemark.co.uk>
COPYING
-------
Copyright (c) 2012 Bytemark Hosting Ltd. Free use of this software is
granted under the terms of the GNU General Public License version 3 or
later.

View File

@@ -59,46 +59,37 @@ listen
~~~~~~ ~~~~~~
$ flexnbd listen --addr <ADDR> --port <PORT> --file <FILE> $ flexnbd listen --addr <ADDR> --port <PORT> --file <FILE>
[--rebind-addr <REBIND-ADDR>] [--rebind-port <REBIND-PORT>]
[--sock <SOCK>] [--default-deny] [global option]* [acl entry]* [--sock <SOCK>] [--default-deny] [global option]* [acl entry]*
Listen for an inbound migration, then serve it as normal once it has Listen for an inbound migration, and quit with a status of 0 on
completed. completion.
flexnbd will wait for a successful migration, and then switch into flexnbd will wait for a successful migration, and then quit. The file
'serve' mode. The file to write the inbound migration data to must to write the inbound migration data to must already exist before you
already exist before you run 'flexnbd listen'. run 'flexnbd listen'.
Only one sender may connect to send data, and the server is not Only one sender may connect to send data, and if the sender
available to clients while the migration is taking place. disconnects part-way through the migration, the destination will
expect it to reconnect and retry the whole migration. It isn't safe
If the sender disconnects part-way through the migration, the to assume that a partial migration can be resumed because the
destination will expect it to reconnect and retry the whole migration. destination has no knowledge of whether a client has made a write to
It isn't safe to assume that a partial migration can be resumed because
the destination has no knowledge of whether a client has made a write to
the source in the interim. the source in the interim.
To support transparently replacing an existing server, flexnbd can If the migration fails for a reason which the `flexnbd listen` process
switch addresses once it has received a successful migration. can't fix (say, a failed local write), it will exit with an error
status. In this case, the sender will continually retry the migration
until it succeeds, and you will need to restart the `flexnbd listen`
process to allow that to happen.
Options Options
^^^^^^^ ^^^^^^^
As for 'serve', with these additions: As for 'serve'.
*--rebind-addr, -L REBIND_ADDR*:
The address to rebind to once migration has completed.
*--rebind-port, -P REBIND_PORT*:
The port to rebind to once migration has completed.
Either, both, or neither of --rebind-port and rebind-addr may be given.
If rebinding fails, flexnbd will retry every second until it succeeds.
mirror mirror
~~~~~~ ~~~~~~
$ flexnbd mirror --addr <ADDR> --port <PORT> --sock SOCK $ flexnbd mirror --addr <ADDR> --port <PORT> --sock SOCK
[--bind <BIND-ADDR>] [global option]* [--unlink] [--bind <BIND-ADDR>] [global option]*
Start a migration from the server with control socket SOCK to the server Start a migration from the server with control socket SOCK to the server
listening at ADDR:PORT. listening at ADDR:PORT.
@@ -115,7 +106,15 @@ again. It is not safe to resume the migration from where it left off
because the source can't see that the backing store behind the because the source can't see that the backing store behind the
destination is intact, or even on the same machine. destination is intact, or even on the same machine.
Note: files smaller than 4096 bytes cannot be migrated. If the `--unlink` option is given, the local file will be deleted
immediately before the mirror connection is terminated. This allows
an otherwise-ambiguous situation to be resolved: if you don't unlink
the file and the flexnbd process at either end is terminated, it's not
possible to tell which copy of the data is canonical. Since the
unlink happens as soon as the sender knows that it has transmitted all
the data, there can be no ambiguity.
Note: files smaller than 4096 bytes cannot be mirrored.
Options Options
^^^^^^^ ^^^^^^^
@@ -129,10 +128,29 @@ Options
*--sock, -s SOCK*: *--sock, -s SOCK*:
The control socket of the local server to migrate from. Required. The control socket of the local server to migrate from. Required.
*--unlink, -u*:
Unlink the served file from the local filesystem after successfully
mirroring.
*--bind, -b BIND-ADDR*: *--bind, -b BIND-ADDR*:
The local address to bind to. You may need this if the remote server The local address to bind to. You may need this if the remote server
is using an access control list. is using an access control list.
break
~~~~~
$ flexnbd mirror --sock SOCK [global option]*
Stop a running migration.
Options
^^^^^^^
*--sock, -s SOCK*:
The control socket of the local server whose emigration to stop.
Required.
acl acl
~~~ ~~~
@@ -160,12 +178,14 @@ The status will be printed to STDOUT. It is a space-separated list of
key=value pairs. The space character will never appear in a key or key=value pairs. The space character will never appear in a key or
value. Currently reported values are: value. Currently reported values are:
*pid*:
The process id of the server listening on SOCK.
*is_mirroring*: *is_mirroring*:
'true' if this server is sending migration data, 'false' otherwise. 'true' if this server is sending migration data, 'false' otherwise.
*has_control*: *has_control*:
'false' if this server was started in 'listen' mode and has not yet 'false' if this server was started in 'listen' mode. 'true' otherwise.
received a successful migration. 'true' otherwise.
read read
~~~~ ~~~~
@@ -258,7 +278,10 @@ error to set both --verbose and --quiet. The last one wins.
The log line format is: The log line format is:
<LEVEL>:<PID> <THREAD> <SOURCEFILE>:<SOURCELINE>: <MSG> <TIMESTAMP>:<LEVEL>:<PID> <THREAD> <SOURCEFILE>:<SOURCELINE>: <MSG>
*TIMESTAMP*:
Time the log entry was made. This is expressed in terms of monotonic ms.
*LEVEL*: *LEVEL*:
This will be one of 'D', 'I', 'W', 'E', 'F' in increasing order of This will be one of 'D', 'I', 'W', 'E', 'F' in increasing order of
@@ -305,7 +328,7 @@ In order to read a server's status, we need it to open a control socket.
$ flexnbd serve --file /tmp/passwd --addr 0.0.0.0 --port 4777 \ $ flexnbd serve --file /tmp/passwd --addr 0.0.0.0 --port 4777 \
--sock /tmp/flexnbd.sock --sock /tmp/flexnbd.sock
$ flexnbd status --sock /tmp/flexnbd.sock $ flexnbd status --sock /tmp/flexnbd.sock
is_mirroring=false has_control=true pid=9635 is_mirroring=false has_control=true
$ $
@@ -316,7 +339,7 @@ Migrating
To migrate, we need to provide a destination file of the right size. To migrate, we need to provide a destination file of the right size.
$ dd if=/dev/random of=/tmp/data bs=1M count=1 $ dd if=/dev/urandom of=/tmp/data bs=1024 count=1K
$ truncate -s 1M /tmp/data.copy $ truncate -s 1M /tmp/data.copy
$ flexnbd serve --file /tmp/data --addr 0.0.0.0 --port 4778 \ $ flexnbd serve --file /tmp/data --addr 0.0.0.0 --port 4778 \
--sock /tmp/flex-source.sock & --sock /tmp/flex-source.sock &
@@ -328,9 +351,9 @@ Now we check the status of each server, to check that they are both in
the right state: the right state:
$ flexnbd status --sock /tmp/flex-source.sock $ flexnbd status --sock /tmp/flex-source.sock
is_mirroring=false has_control=true pid=9648 is_mirroring=false has_control=true
$ flexnbd status --sock /tmp/flex-dest.sock $ flexnbd status --sock /tmp/flex-dest.sock
is_mirroring=false has_control=false pid=9651 is_mirroring=false has_control=false
$ $
With this knowledge in hand, we can start the migration: With this knowledge in hand, we can start the migration:
@@ -339,16 +362,12 @@ With this knowledge in hand, we can start the migration:
--sock /tmp/flex-source.sock --sock /tmp/flex-source.sock
Migration started Migration started
[1] + 9648 done build/flexnbd serve --addr 0.0.0.0 --port 4778 [1] + 9648 done build/flexnbd serve --addr 0.0.0.0 --port 4778
[2] + 9651 done build/flexnbd listen --addr 0.0.0.0 --port 4779
$ $
Note that because the file is so small in this case, we see the source Note that because the file is so small in this case, we see the source
server quit soon after we start the migration. server quit soon after we start the migration, and the destination
exited at roughly the same time.
We can check the status of the destination server, to ensure that it
took control:
$ flexnbd status --sock /tmp/flex-dest.sock
is_mirroring=false has_control=true
BUGS BUGS
---- ----
@@ -359,8 +378,8 @@ AUTHOR
------ ------
Written by Alex Young <alex@bytemark.co.uk>. Written by Alex Young <alex@bytemark.co.uk>.
Original concept and core code by Matthew Bloch Original concept and core code by Matthew Bloch <matthew@bytemark.co.uk>.
<matthew@bytemark.co.uk>. Some additions by Nick Thomas <nick@bytemark.co.uk>
COPYING COPYING
------- -------
@@ -368,3 +387,4 @@ COPYING
Copyright (c) 2012 Bytemark Hosting Ltd. Free use of this software is Copyright (c) 2012 Bytemark Hosting Ltd. Free use of this software is
granted under the terms of the GNU General Public License version 3 or granted under the terms of the GNU General Public License version 3 or
later. later.

124
Rakefile
View File

@@ -8,12 +8,21 @@ DEBUG = ENV.has_key?('DEBUG') &&
%w|yes y ok 1 true t|.include?(ENV['DEBUG']) %w|yes y ok 1 true t|.include?(ENV['DEBUG'])
ALL_SOURCES = FileList['src/*'] ALL_SOURCES = FileList['src/*']
SOURCES = ALL_SOURCES.select { |c| c =~ /\.c$/ }
OBJECTS = SOURCES.pathmap( "%{^src,build}X.o" ) PROXY_ONLY_SOURCES = FileList['src/{proxy-main,proxy}.c']
PROXY_ONLY_OBJECTS = PROXY_ONLY_SOURCES.pathmap( "%{^src,build}X.o" )
SOURCES = ALL_SOURCES.select { |c| c =~ /\.c$/ } - PROXY_ONLY_SOURCES
OBJECTS = SOURCES.pathmap( "%{^src,build}X.o" ) - PROXY_ONLY_OBJECTS
PROXY_SOURCES = FileList['src/{ioutil,nbdtypes,readwrite,sockutil,util,parse}.c'] + PROXY_ONLY_SOURCES
PROXY_OBJECTS = PROXY_SOURCES.pathmap( "%{^src,build}X.o" )
TEST_SOURCES = FileList['tests/unit/*.c'] TEST_SOURCES = FileList['tests/unit/*.c']
TEST_OBJECTS = TEST_SOURCES.pathmap( "%{^tests/unit,build/tests}X.o" ) TEST_OBJECTS = TEST_SOURCES.pathmap( "%{^tests/unit,build/tests}X.o" )
LIBS = %w( pthread ) LIBS = %w( pthread )
LDFLAGS = ["-lrt -lev"]
CCFLAGS = %w( CCFLAGS = %w(
-D_GNU_SOURCE=1 -D_GNU_SOURCE=1
-Wall -Wall
@@ -23,8 +32,10 @@ CCFLAGS = %w(
-Wno-missing-field-initializers -Wno-missing-field-initializers
) + # Added -Wno-missing-field-initializers to shut GCC up over {0} struct initialisers ) + # Added -Wno-missing-field-initializers to shut GCC up over {0} struct initialisers
[ENV['CFLAGS']] [ENV['CFLAGS']]
LDFLAGS = []
LIBCHECK = "/usr/lib/libcheck.a" LIBCHECK = File.exists?("/usr/lib/libcheck.a") ?
"/usr/lib/libcheck.a" :
"/usr/local/lib/libcheck.a"
TEST_MODULES = Dir["tests/unit/check_*.c"].map { |n| TEST_MODULES = Dir["tests/unit/check_*.c"].map { |n|
File.basename( n )[%r{check_(.+)\.c},1] } File.basename( n )[%r{check_(.+)\.c},1] }
@@ -32,29 +43,43 @@ TEST_MODULES = Dir["tests/unit/check_*.c"].map { |n|
if DEBUG if DEBUG
LDFLAGS << ["-g"] LDFLAGS << ["-g"]
CCFLAGS << ["-g -DDEBUG"] CCFLAGS << ["-g -DDEBUG"]
else
CCFLAGS << "-O2"
end end
desc "Build the binary and man page" desc "Build the binary and man page"
task :build => ['build/flexnbd', 'build/flexnbd.1.gz'] task :build => [:flexnbd, :flexnbd_proxy, :man]
task :default => :build task :default => :build
desc "Build just the binary" desc "Build just the flexnbd binary"
task :flexnbd => "build/flexnbd" task :flexnbd => "build/flexnbd"
desc "Build just the flexnbd-proxy binary"
task :flexnbd_proxy => "build/flexnbd-proxy"
def check(m) def check(m)
"build/tests/check_#{m}" "build/tests/check_#{m}"
end end
file "README.txt" file "README.txt"
file "README.proxy.txt"
def manpage(name, src)
FileUtils.mkdir_p( "build" )
sh "a2x --destination-dir build --format manpage #{src}"
sh "gzip -f build/#{name}"
end
file "build/flexnbd.1.gz" => "README.txt" do file "build/flexnbd.1.gz" => "README.txt" do
FileUtils.mkdir_p( "build" ) manpage("flexnbd.1", "README.txt")
sh "a2x --destination-dir build --format manpage README.txt" end
sh "gzip build/flexnbd.1"
file "build/flexnbd-proxy.1.gz" => "README.proxy.txt" do
manpage("flexnbd-proxy.1", "README.proxy.txt")
end end
desc "Build just the man page" desc "Build just the man page"
task :man => "build/flexnbd.1.gz" task :man => ["build/flexnbd.1.gz", "build/flexnbd-proxy.1.gz"]
namespace "test" do namespace "test" do
@@ -80,8 +105,8 @@ namespace "test" do
end end
desc "Run NBD test scenarios" desc "Run NBD test scenarios"
task 'scenarios' => 'flexnbd' do task 'scenarios' => ['build/flexnbd', 'build/flexnbd-proxy'] do
sh "cd tests/acceptance; ruby nbd_scenarios" sh "cd tests/acceptance; ruby nbd_scenarios -v"
end end
end end
@@ -96,16 +121,20 @@ def gcc_link(target, objects)
FileUtils.mkdir_p File.dirname( target ) FileUtils.mkdir_p File.dirname( target )
sh "#{CC} #{LDFLAGS.join(' ')} "+ sh "#{CC} #{LDFLAGS.join(' ')} "+
LIBS.map { |l| "-l#{l}" }.join(" ")+
" -Isrc " + " -Isrc " +
" -o #{target} "+ " -o #{target} "+
objects.join(" ") objects.join(" ") +
" "+LIBS.map { |l| "-l#{l}" }.join(" ")
end end
def headers(c) def headers(c)
`#{CC} -Isrc -MM #{c}`.gsub("\\\n", " ").split(" ")[2..-1] `#{CC} -Isrc -MM #{c}`.gsub("\\\n", " ").split(" ")[2..-1]
end end
rule 'build/flexnbd-proxy' => PROXY_OBJECTS do |t|
gcc_link(t.name, t.sources)
end
rule 'build/flexnbd' => OBJECTS do |t| rule 'build/flexnbd' => OBJECTS do |t|
gcc_link(t.name, t.sources) gcc_link(t.name, t.sources)
end end
@@ -115,7 +144,6 @@ file check("client") =>
%w{build/tests/check_client.o %w{build/tests/check_client.o
build/self_pipe.o build/self_pipe.o
build/nbdtypes.o build/nbdtypes.o
build/listen.o
build/flexnbd.o build/flexnbd.o
build/flexthread.o build/flexthread.o
build/control.o build/control.o
@@ -128,6 +156,7 @@ file check("client") =>
build/mbox.o build/mbox.o
build/mirror.o build/mirror.o
build/status.o build/status.o
build/sockutil.o
build/util.o} do |t| build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK] gcc_link t.name, t.prerequisites + [LIBCHECK]
end end
@@ -160,14 +189,38 @@ file check("serve") =>
build/flexnbd.o build/flexnbd.o
build/mirror.o build/mirror.o
build/status.o build/status.o
build/listen.o
build/acl.o build/acl.o
build/mbox.o build/mbox.o
build/ioutil.o build/ioutil.o
build/sockutil.o
build/util.o} do |t| build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK] gcc_link t.name, t.prerequisites + [LIBCHECK]
end end
file check("status") =>
%w{
build/tests/check_status.o
build/self_pipe.o
build/nbdtypes.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/flexthread.o
build/serve.o
build/flexnbd.o
build/mirror.o
build/status.o
build/acl.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o
} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("readwrite") => file check("readwrite") =>
%w{build/tests/check_readwrite.o %w{build/tests/check_readwrite.o
build/readwrite.o build/readwrite.o
@@ -181,42 +234,22 @@ file check("readwrite") =>
build/flexnbd.o build/flexnbd.o
build/mirror.o build/mirror.o
build/status.o build/status.o
build/listen.o
build/nbdtypes.o build/nbdtypes.o
build/mbox.o build/mbox.o
build/ioutil.o build/ioutil.o
build/sockutil.o
build/util.o} do |t| build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK] gcc_link t.name, t.prerequisites + [LIBCHECK]
end end
file check("listen") =>
%w{build/tests/check_listen.o
build/listen.o
build/flexnbd.o
build/status.o
build/flexthread.o
build/mbox.o
build/mirror.o
build/self_pipe.o
build/nbdtypes.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/serve.o
build/acl.o
build/ioutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("flexnbd") => file check("flexnbd") =>
%w{build/tests/check_flexnbd.o %w{build/tests/check_flexnbd.o
build/flexnbd.o build/flexnbd.o
build/ioutil.o build/ioutil.o
build/sockutil.o
build/util.o build/util.o
build/control.o build/control.o
build/listen.o
build/mbox.o build/mbox.o
build/flexthread.o build/flexthread.o
build/status.o build/status.o
@@ -231,16 +264,17 @@ file check("flexnbd") =>
gcc_link t.name, t.prerequisites + [LIBCHECK] gcc_link t.name, t.prerequisites + [LIBCHECK]
end end
file check("control") => file check("control") =>
%w{build/tests/check_control.o} + OBJECTS - ["build/main.o"] do |t| %w{build/tests/check_control.o} + OBJECTS - ["build/main.o", 'build/proxy-main.o', 'build/proxy.o'] do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK] gcc_link t.name, t.prerequisites + [LIBCHECK]
end end
(TEST_MODULES- %w{control flexnbd acl client serve readwrite listen util}).each do |m| (TEST_MODULES- %w{status control flexnbd acl client serve readwrite util}).each do |m|
tgt = "build/tests/check_#{m}.o" tgt = "build/tests/check_#{m}.o"
maybe_obj_name = "build/#{m}.o" maybe_obj_name = "build/#{m}.o"
# Take it out in case we're testing util.o or ioutil.o # Take it out in case we're testing one of the utils
deps = ["build/ioutil.o", "build/util.o"] - [maybe_obj_name] deps = ["build/ioutil.o", "build/util.o", "build/sockutil.o"] - [maybe_obj_name]
# Add it back in if it's something we need to compile # Add it back in if it's something we need to compile
deps << maybe_obj_name if OBJECTS.include?( maybe_obj_name ) deps << maybe_obj_name if OBJECTS.include?( maybe_obj_name )
@@ -255,6 +289,10 @@ OBJECTS.zip( SOURCES ).each do |o,c|
file o => [c]+headers(c) do |t| gcc_compile( o, c ) end file o => [c]+headers(c) do |t| gcc_compile( o, c ) end
end end
PROXY_ONLY_OBJECTS.zip( PROXY_ONLY_SOURCES).each do |o, c|
file o => [c]+headers(c) do |t| gcc_compile( o, c ) end
end
TEST_OBJECTS.zip( TEST_SOURCES ).each do |o,c| TEST_OBJECTS.zip( TEST_SOURCES ).each do |o,c|
file o => [c] + headers(c) do |t| gcc_compile( o, c ) end file o => [c] + headers(c) do |t| gcc_compile( o, c ) end
end end
@@ -266,7 +304,7 @@ end
namespace :pkg do namespace :pkg do
deb do |t| deb do |t|
t.code_files = ALL_SOURCES + ["Rakefile", "README.txt"] t.code_files = ALL_SOURCES + ["Rakefile", "README.txt", "README.proxy.txt"]
t.pkg_name = "flexnbd" t.pkg_name = "flexnbd"
t.generate_changelog! t.generate_changelog!
end end

15
debian/control vendored
View File

@@ -2,13 +2,24 @@ Source: flexnbd
Section: unknown Section: unknown
Priority: extra Priority: extra
Maintainer: Alex Young <alex@bytemark.co.uk> Maintainer: Alex Young <alex@bytemark.co.uk>
Build-Depends: cdbs, debhelper (>= 7), ruby, rake, gcc Build-Depends: cdbs, debhelper (>= 7.0.50), ruby, rake, gcc, libev-dev
Standards-Version: 3.8.1 Standards-Version: 3.8.1
Homepage: http://bigv.io/ Homepage: http://bigv.io/
Package: flexnbd Package: flexnbd
Architecture: any Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends} Depends: ${shlibs:Depends}, ${misc:Depends}, libev3
Description: FlexNBD server Description: FlexNBD server
An NBD server offering push-mirroring and intelligent sparse file handling An NBD server offering push-mirroring and intelligent sparse file handling
Package: flexnbd-dbg
Architecture: any
Section: debug
Priority: extra
Depends:
flexnbd (= ${binary:Version}),
${misc:Depends}
Description: debugging symbols for flexnbd
An NBD server offering push-mirroring and intelligent sparse file handling
.
This package contains the debugging symbols for flexnbd.

View File

@@ -1,2 +1,5 @@
build/flexnbd usr/bin build/flexnbd usr/bin
build/flexnbd-proxy usr/bin
build/flexnbd.1.gz usr/share/man/man1 build/flexnbd.1.gz usr/share/man/man1
build/flexnbd-proxy.1.gz usr/share/man/man1

4
debian/rules vendored
View File

@@ -12,3 +12,7 @@ override_dh_auto_build:
override_dh_auto_clean: override_dh_auto_clean:
rake clean rake clean
.PHONY: override_dh_strip
override_dh_strip:
dh_strip --dbg-package=flexnbd-dbg

View File

@@ -8,183 +8,404 @@
#include <pthread.h> #include <pthread.h>
static inline char char_with_bit_set(int num) { return 1<<(num%8); } static inline char char_with_bit_set(uint64_t num) { return 1<<(num%8); }
/** Return 1 if the bit at ''idx'' in array ''b'' is set */ /** Return 1 if the bit at ''idx'' in array ''b'' is set */
static inline int bit_is_set(char* b, int idx) { static inline int bit_is_set(char* b, uint64_t idx) {
return (b[idx/8] & char_with_bit_set(idx)) != 0; return (b[idx/8] & char_with_bit_set(idx)) != 0;
} }
/** Return 1 if the bit at ''idx'' in array ''b'' is clear */ /** Return 1 if the bit at ''idx'' in array ''b'' is clear */
static inline int bit_is_clear(char* b, int idx) { static inline int bit_is_clear(char* b, uint64_t idx) {
return !bit_is_set(b, idx); return !bit_is_set(b, idx);
} }
/** Tests whether the bit at ''idx'' in array ''b'' has value ''value'' */ /** Tests whether the bit at ''idx'' in array ''b'' has value ''value'' */
static inline int bit_has_value(char* b, int idx, int value) { static inline int bit_has_value(char* b, uint64_t idx, int value) {
if (value) { return bit_is_set(b, idx); } if (value) { return bit_is_set(b, idx); }
else { return bit_is_clear(b, idx); } else { return bit_is_clear(b, idx); }
} }
/** Sets the bit ''idx'' in array ''b'' */ /** Sets the bit ''idx'' in array ''b'' */
static inline void bit_set(char* b, int idx) { static inline void bit_set(char* b, uint64_t idx) {
b[idx/8] |= char_with_bit_set(idx); b[idx/8] |= char_with_bit_set(idx);
//__sync_fetch_and_or(b+(idx/8), char_with_bit_set(idx)); //__sync_fetch_and_or(b+(idx/8), char_with_bit_set(idx));
} }
/** Clears the bit ''idx'' in array ''b'' */ /** Clears the bit ''idx'' in array ''b'' */
static inline void bit_clear(char* b, int idx) { static inline void bit_clear(char* b, uint64_t idx) {
b[idx/8] &= ~char_with_bit_set(idx); b[idx/8] &= ~char_with_bit_set(idx);
//__sync_fetch_and_nand(b+(idx/8), char_with_bit_set(idx)); //__sync_fetch_and_nand(b+(idx/8), char_with_bit_set(idx));
} }
/** Sets ''len'' bits in array ''b'' starting at offset ''from'' */ /** Sets ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_set_range(char* b, int from, int len) { static inline void bit_set_range(char* b, uint64_t from, uint64_t len)
for (; from%8 != 0 && len > 0; len--) { bit_set(b, from++); } {
if (len >= 8) { memset(b+(from/8), 255, len/8); } for ( ; from%8 != 0 && len > 0 ; len-- ) {
for (; len > 0; len--) { bit_set(b, from++); } bit_set( b, from++ );
}
if (len >= 8) {
memset(b+(from/8), 255, len/8 );
from += len;
len = (len%8);
from -= len;
}
for ( ; len > 0 ; len-- ) {
bit_set( b, from++ );
}
} }
/** Clears ''len'' bits in array ''b'' starting at offset ''from'' */ /** Clears ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_clear_range(char* b, int from, int len) { static inline void bit_clear_range(char* b, uint64_t from, uint64_t len)
for (; from%8 != 0 && len > 0; len--) { bit_clear(b, from++); } {
if (len >= 8) { memset(b+(from/8), 0, len/8); } for ( ; from%8 != 0 && len > 0 ; len-- ) {
for (; len > 0; len--) { bit_clear(b, from++); } bit_clear( b, from++ );
}
if (len >= 8) {
memset(b+(from/8), 0, len/8 );
from += len;
len = (len%8);
from -= len;
}
for ( ; len > 0 ; len-- ) {
bit_clear( b, from++ );
}
} }
/** Counts the number of contiguous bits in array ''b'', starting at ''from'' /** Counts the number of contiguous bits in array ''b'', starting at ''from''
* up to a maximum number of bits ''len''. Returns the number of contiguous * up to a maximum number of bits ''len''. Returns the number of contiguous
* bits that are the same as the first one specified. * bits that are the same as the first one specified. If ''run_is_set'' is
* non-NULL, the value of that bit is placed into it.
*/ */
static inline int bit_run_count(char* b, int from, int len) { static inline uint64_t bit_run_count(char* b, uint64_t from, uint64_t len, int *run_is_set) {
int count; uint64_t* current_block;
uint64_t count = 0;
int first_value = bit_is_set(b, from); int first_value = bit_is_set(b, from);
for (count=0; len > 0 && bit_has_value(b, from+count, first_value); count++, len--) if ( run_is_set != NULL ) {
; *run_is_set = first_value;
}
/* FIXME: debug this later */ for ( ; (from+count) % 64 != 0 && len > 0; len--) {
/*for (; (from+count) % 64 != 0 && len > 0; len--) if (bit_has_value(b, from+count, first_value)) {
if (bit_has_value(b, from+count, first_value))
count++; count++;
else } else {
return count; return count;
}
}
for ( ; len >= 64 ; len -= 64 ) { for ( ; len >= 64 ; len -= 64 ) {
if (*((uint64_t*)(b + ((from+count)/8))) == UINT64_MAX) current_block = (uint64_t*) (b + ((from+count)/8));
if (*current_block == ( first_value ? UINT64_MAX : 0 ) ) {
count += 64; count += 64;
else } else {
break; break;
} }
for (; len > 0; len--) }
if (bit_is_set(b, from+count))
count++;*/ for ( ; len > 0; len-- ) {
if ( bit_has_value(b, from+count, first_value) ) {
count++;
}
}
return count; return count;
} }
enum bitset_stream_events {
BITSET_STREAM_UNSET = 0,
BITSET_STREAM_SET = 1,
BITSET_STREAM_ON = 2,
BITSET_STREAM_OFF = 3
};
struct bitset_stream_entry {
enum bitset_stream_events event;
uint64_t from;
uint64_t len;
};
/** Limit the stream size to 1MB for now.
*
* If this is too small, it'll cause requests to stall as the migration lags
* behind the changes made by those requests.
*/
#define BITSET_STREAM_SIZE ( ( 1024 * 1024 ) / sizeof( struct bitset_stream_entry ) )
struct bitset_stream {
struct bitset_stream_entry entries[BITSET_STREAM_SIZE];
int in;
int out;
int size;
pthread_mutex_t mutex;
pthread_cond_t cond_not_full;
pthread_cond_t cond_not_empty;
};
/** An application of a bitset - a bitset mapping represents a file of ''size'' /** An application of a bitset - a bitset mapping represents a file of ''size''
* broken down into ''resolution''-sized chunks. The bit set is assumed to * broken down into ''resolution''-sized chunks. The bit set is assumed to
* represent one bit per chunk. * represent one bit per chunk. We also bundle a lock so that the set can be
* written reliably by multiple threads.
*/ */
struct bitset_mapping { struct bitset {
pthread_mutex_t lock;
uint64_t size; uint64_t size;
int resolution; int resolution;
struct bitset_stream *stream;
int stream_enabled;
char bits[]; char bits[];
}; };
/** Allocate a bitset_mapping for a file of the given size, and chunks of the /** Allocate a bitset for a file of the given size, and chunks of the
* given resolution. * given resolution.
*/ */
static inline struct bitset_mapping* bitset_alloc( static inline struct bitset *bitset_alloc( uint64_t size, int resolution )
uint64_t size,
int resolution
)
{ {
struct bitset_mapping *bitset = xmalloc( struct bitset *bitset = xmalloc(
sizeof(struct bitset_mapping)+ sizeof( struct bitset ) + ( size + resolution - 1 ) / resolution
(size+resolution-1)/resolution
); );
bitset->size = size; bitset->size = size;
bitset->resolution = resolution; bitset->resolution = resolution;
/* don't actually need to call pthread_mutex_destroy '*/
pthread_mutex_init(&bitset->lock, NULL);
bitset->stream = xmalloc( sizeof( struct bitset_stream ) );
pthread_mutex_init( &bitset->stream->mutex, NULL );
/* Technically don't need to call pthread_cond_destroy either */
pthread_cond_init( &bitset->stream->cond_not_full, NULL );
pthread_cond_init( &bitset->stream->cond_not_empty, NULL );
return bitset; return bitset;
} }
static inline void bitset_free( struct bitset * set )
{
/* TODO: free our mutex... */
free( set->stream );
set->stream = NULL;
free( set );
}
#define INT_FIRST_AND_LAST \ #define INT_FIRST_AND_LAST \
int first = from/set->resolution, \ uint64_t first = from/set->resolution, \
last = (from+len-1)/set->resolution, \ last = ((from+len)-1)/set->resolution, \
bitlen = last-first+1 bitlen = (last-first)+1
#define BITSET_LOCK \
FATAL_IF_NEGATIVE(pthread_mutex_lock(&set->lock), "Error locking bitset")
#define BITSET_UNLOCK \
FATAL_IF_NEGATIVE(pthread_mutex_unlock(&set->lock), "Error unlocking bitset")
static inline void bitset_stream_enqueue(
struct bitset * set,
enum bitset_stream_events event,
uint64_t from,
uint64_t len
)
{
struct bitset_stream * stream = set->stream;
pthread_mutex_lock( &stream->mutex );
while ( stream->size == BITSET_STREAM_SIZE ) {
pthread_cond_wait( &stream->cond_not_full, &stream->mutex );
}
stream->entries[stream->in].event = event;
stream->entries[stream->in].from = from;
stream->entries[stream->in].len = len;
stream->size++;
stream->in++;
stream->in %= BITSET_STREAM_SIZE;
pthread_mutex_unlock( & stream->mutex );
pthread_cond_broadcast( &stream->cond_not_empty );
return;
}
static inline void bitset_stream_dequeue(
struct bitset * set,
struct bitset_stream_entry * out
)
{
struct bitset_stream * stream = set->stream;
pthread_mutex_lock( &stream->mutex );
while ( stream->size == 0 ) {
pthread_cond_wait( &stream->cond_not_empty, &stream->mutex );
}
if ( out != NULL ) {
out->event = stream->entries[stream->out].event;
out->from = stream->entries[stream->out].from;
out->len = stream->entries[stream->out].len;
}
stream->size--;
stream->out++;
stream->out %= BITSET_STREAM_SIZE;
pthread_mutex_unlock( &stream->mutex );
pthread_cond_broadcast( &stream->cond_not_full );
return;
}
static inline size_t bitset_stream_size( struct bitset * set )
{
size_t size;
pthread_mutex_lock( &set->stream->mutex );
size = set->stream->size;
pthread_mutex_unlock( &set->stream->mutex );
return size;
}
static inline uint64_t bitset_stream_queued_bytes(
struct bitset * set,
enum bitset_stream_events event
)
{
uint64_t total = 0;
int i;
pthread_mutex_lock( &set->stream->mutex );
for ( i = set->stream->out; i < set->stream->in ; i++ ) {
if ( set->stream->entries[i].event == event ) {
total += set->stream->entries[i].len;
}
}
pthread_mutex_unlock( &set->stream->mutex );
return total;
}
static inline void bitset_enable_stream( struct bitset * set )
{
BITSET_LOCK;
set->stream_enabled = 1;
bitset_stream_enqueue( set, BITSET_STREAM_ON, 0, set->size );
BITSET_UNLOCK;
}
static inline void bitset_disable_stream( struct bitset * set )
{
BITSET_LOCK;
bitset_stream_enqueue( set, BITSET_STREAM_OFF, 0, set->size );
set->stream_enabled = 0;
BITSET_UNLOCK;
}
/** Set the bits in a bitset which correspond to the given bytes in the larger /** Set the bits in a bitset which correspond to the given bytes in the larger
* file. * file.
*/ */
static inline void bitset_set_range( static inline void bitset_set_range(
struct bitset_mapping* set, struct bitset * set,
uint64_t from, uint64_t from,
uint64_t len) uint64_t len)
{ {
INT_FIRST_AND_LAST; INT_FIRST_AND_LAST;
BITSET_LOCK;
bit_set_range(set->bits, first, bitlen); bit_set_range(set->bits, first, bitlen);
if ( set->stream_enabled ) {
bitset_stream_enqueue( set, BITSET_STREAM_SET, from, len );
}
BITSET_UNLOCK;
} }
/** Set every bit in the bitset. */ /** Set every bit in the bitset. */
static inline void bitset_set( static inline void bitset_set( struct bitset * set )
struct bitset_mapping* set
)
{ {
bitset_set_range(set, 0, set->size); bitset_set_range(set, 0, set->size);
} }
/** Clear the bits in a bitset which correspond to the given bytes in the /** Clear the bits in a bitset which correspond to the given bytes in the
* larger file. * larger file.
*/ */
static inline void bitset_clear_range( static inline void bitset_clear_range(
struct bitset_mapping* set, struct bitset * set,
uint64_t from, uint64_t from,
uint64_t len) uint64_t len)
{ {
INT_FIRST_AND_LAST; INT_FIRST_AND_LAST;
BITSET_LOCK;
bit_clear_range(set->bits, first, bitlen); bit_clear_range(set->bits, first, bitlen);
if ( set->stream_enabled ) {
bitset_stream_enqueue( set, BITSET_STREAM_UNSET, from, len );
}
BITSET_UNLOCK;
} }
/** Clear every bit in the bitset. */ /** Clear every bit in the bitset. */
static inline void bitset_clear( static inline void bitset_clear( struct bitset * set )
struct bitset_mapping *set
)
{ {
bitset_clear_range(set, 0, set->size); bitset_clear_range(set, 0, set->size);
} }
/** As per bitset_run_count but also tells you whether the run it found was set
* or unset, atomically.
*/
static inline uint64_t bitset_run_count_ex(
struct bitset * set,
uint64_t from,
uint64_t len,
int* run_is_set
)
{
uint64_t run;
/* Clip our requests to the end of the bitset, avoiding uint underflow. */
if ( from > set->size ) {
return 0;
}
len = ( len + from ) > set->size ? ( set->size - from ) : len;
INT_FIRST_AND_LAST;
BITSET_LOCK;
run = bit_run_count(set->bits, first, bitlen, run_is_set) * set->resolution;
run -= (from % set->resolution);
BITSET_UNLOCK;
return run;
}
/** Counts the number of contiguous bytes that are represented as a run in /** Counts the number of contiguous bytes that are represented as a run in
* the bit field. * the bit field.
*/ */
static inline int bitset_run_count( static inline uint64_t bitset_run_count(
struct bitset_mapping* set, struct bitset * set,
uint64_t from, uint64_t from,
uint64_t len) uint64_t len)
{ {
/* now fix in case len goes past the end of the memory we have return bitset_run_count_ex( set, from, len, NULL );
* control of */
len = len+from>set->size ? set->size-from : len;
INT_FIRST_AND_LAST;
return (bit_run_count(set->bits, first, bitlen) * set->resolution) -
(from % set->resolution);
} }
/** Tests whether the bit field is clear for the given file offset. /** Tests whether the bit field is clear for the given file offset.
*/ */
static inline int bitset_is_clear_at( static inline int bitset_is_clear_at( struct bitset * set, uint64_t at )
struct bitset_mapping* set,
uint64_t at
)
{ {
return bit_is_clear(set->bits, at/set->resolution); return bit_is_clear(set->bits, at/set->resolution);
} }
/** Tests whether the bit field is set for the given file offset. /** Tests whether the bit field is set for the given file offset.
*/ */
static inline int bitset_is_set_at( static inline int bitset_is_set_at( struct bitset * set, uint64_t at )
struct bitset_mapping* set,
uint64_t at
)
{ {
return bit_is_set(set->bits, at/set->resolution); return bit_is_set(set->bits, at/set->resolution);
} }

View File

@@ -1,12 +1,12 @@
#include "client.h" #include "client.h"
#include "serve.h" #include "serve.h"
#include "util.h"
#include "ioutil.h" #include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "bitset.h" #include "bitset.h"
#include "nbdtypes.h" #include "nbdtypes.h"
#include "self_pipe.h" #include "self_pipe.h"
#include <sys/mman.h> #include <sys/mman.h>
#include <errno.h> #include <errno.h>
#include <stdlib.h> #include <stdlib.h>
@@ -15,26 +15,29 @@
#include <sys/stat.h> #include <sys/stat.h>
#include <fcntl.h> #include <fcntl.h>
struct client *client_create( struct server *serve, int socket ) struct client *client_create( struct server *serve, int socket )
{ {
NULLCHECK( serve ); NULLCHECK( serve );
struct client *c; struct client *c;
struct sigevent evp = {
.sigev_notify = SIGEV_SIGNAL,
.sigev_signo = CLIENT_KILLSWITCH_SIGNAL
};
c = xmalloc( sizeof( struct server ) ); c = xmalloc( sizeof( struct client ) );
c->stopped = 0; c->stopped = 0;
c->socket = socket; c->socket = socket;
c->serve = serve; c->serve = serve;
c->stop_signal = self_pipe_create(); c->stop_signal = self_pipe_create();
c->entrusted = 0; FATAL_IF_NEGATIVE(
timer_create( CLOCK_MONOTONIC, &evp, &(c->killswitch) ),
SHOW_ERRNO( "Failed to create killswitch timer" )
);
debug( "Alloced client %p (%d, %d)", c, c->stop_signal->read_fd, c->stop_signal->write_fd ); debug( "Alloced client %p with socket %d", c, socket );
return c; return c;
} }
@@ -51,8 +54,14 @@ void client_destroy( struct client *client )
{ {
NULLCHECK( client ); NULLCHECK( client );
FATAL_IF_NEGATIVE(
timer_delete( client->killswitch ),
SHOW_ERRNO( "Couldn't delete killswitch" )
);
debug( "Destroying stop signal for client %p", client ); debug( "Destroying stop signal for client %p", client );
self_pipe_destroy( client->stop_signal ); self_pipe_destroy( client->stop_signal );
debug( "Freeing client %p", client );
free( client ); free( client );
} }
@@ -61,7 +70,7 @@ void client_destroy( struct client *client )
/** /**
* So waiting on client->socket is len bytes of data, and we must write it all * So waiting on client->socket is len bytes of data, and we must write it all
* to client->mapped. However while doing do we must consult the bitmap * to client->mapped. However while doing do we must consult the bitmap
* client->block_allocation_map, which is a bitmap where one bit represents * client->serve->allocation_map, which is a bitmap where one bit represents
* block_allocation_resolution bytes. Where a bit isn't set, there are no * block_allocation_resolution bytes. Where a bit isn't set, there are no
* disc blocks allocated for that portion of the file, and we'd like to keep * disc blocks allocated for that portion of the file, and we'd like to keep
* it that way. * it that way.
@@ -70,11 +79,13 @@ void client_destroy( struct client *client )
* allocated, we can proceed as normal and make one call to writeloop. * allocated, we can proceed as normal and make one call to writeloop.
* *
*/ */
void write_not_zeroes(struct client* client, uint64_t from, int len) void write_not_zeroes(struct client* client, uint64_t from, uint64_t len)
{ {
NULLCHECK( client ); NULLCHECK( client );
NULLCHECK( client->serve );
NULLCHECK( client->serve->allocation_map );
struct bitset_mapping *map = client->serve->allocation_map; struct bitset * map = client->serve->allocation_map;
while (len > 0) { while (len > 0) {
/* so we have to calculate how much of our input to consider /* so we have to calculate how much of our input to consider
@@ -85,7 +96,7 @@ void write_not_zeroes(struct client* client, uint64_t from, int len)
* and end to get the exact number of bytes. * and end to get the exact number of bytes.
*/ */
int run = bitset_run_count(map, from, len); uint64_t run = bitset_run_count(map, from, len);
debug("write_not_zeroes: from=%ld, len=%d, run=%d", from, len, run); debug("write_not_zeroes: from=%ld, len=%d, run=%d", from, len, run);
@@ -121,7 +132,12 @@ void write_not_zeroes(struct client* client, uint64_t from, int len)
debug("writing the lot: from=%ld, run=%d", from, run); debug("writing the lot: from=%ld, run=%d", from, run);
/* already allocated, just write it all */ /* already allocated, just write it all */
DO_READ(client->mapped + from, run); DO_READ(client->mapped + from, run);
server_dirty(client->serve, from, run); /* We know from our earlier call to bitset_run_count that the
* bitset is all-1s at this point, but we need to dirty it for the
* sake of the event stream - the actual bytes have changed, and we
* are interested in that fact.
*/
bitset_set_range( map, from, run );
len -= run; len -= run;
from += run; from += run;
} }
@@ -129,7 +145,7 @@ void write_not_zeroes(struct client* client, uint64_t from, int len)
char zerobuffer[block_allocation_resolution]; char zerobuffer[block_allocation_resolution];
/* not allocated, read in block_allocation_resoution */ /* not allocated, read in block_allocation_resoution */
while (run > 0) { while (run > 0) {
int blockrun = block_allocation_resolution - uint64_t blockrun = block_allocation_resolution -
(from % block_allocation_resolution); (from % block_allocation_resolution);
if (blockrun > run) if (blockrun > run)
blockrun = run; blockrun = run;
@@ -141,11 +157,13 @@ void write_not_zeroes(struct client* client, uint64_t from, int len)
* and memcpy being fast, rather than try to * and memcpy being fast, rather than try to
* hand-optimized something specific. * hand-optimized something specific.
*/ */
if (zerobuffer[0] != 0 ||
memcmp(zerobuffer, zerobuffer + 1, blockrun - 1)) { int all_zeros = (zerobuffer[0] == 0) &&
(0 == memcmp( zerobuffer, zerobuffer+1, blockrun-1 ));
if ( !all_zeros ) {
memcpy(client->mapped+from, zerobuffer, blockrun); memcpy(client->mapped+from, zerobuffer, blockrun);
bitset_set_range(map, from, blockrun); bitset_set_range(map, from, blockrun);
server_dirty(client->serve, from, blockrun);
/* at this point we could choose to /* at this point we could choose to
* short-cut the rest of the write for * short-cut the rest of the write for
* faster I/O but by continuing to do it * faster I/O but by continuing to do it
@@ -153,6 +171,10 @@ void write_not_zeroes(struct client* client, uint64_t from, int len)
* sparseness as possible. * sparseness as possible.
*/ */
} }
/* When the block is all_zeroes, no bytes have changed, so we
* don't need to put an event into the bitset stream. This may
* be surprising in the future.
*/
len -= blockrun; len -= blockrun;
run -= blockrun; run -= blockrun;
@@ -178,19 +200,24 @@ int client_read_request( struct client * client , struct nbd_request *out_reques
struct nbd_request_raw request_raw; struct nbd_request_raw request_raw;
fd_set fds; fd_set fds;
struct timeval tv = {CLIENT_MAX_WAIT_SECS, 0}; struct timeval * ptv = NULL;
struct timeval * ptv;
int fd_count; int fd_count;
/* We want a timeout if this is an inbound migration, but not /* We want a timeout if this is an inbound migration, but not otherwise.
* otherwise * This is compile-time selectable, as it will break mirror max_bps
*/ */
ptv = server_is_in_control( client->serve ) ? NULL : &tv; #ifdef HAS_LISTEN_TIMEOUT
struct timeval tv = {CLIENT_MAX_WAIT_SECS, 0};
if ( !server_is_in_control( client->serve ) ) {
ptv = &tv;
}
#endif
FD_ZERO(&fds); FD_ZERO(&fds);
FD_SET(client->socket, &fds); FD_SET(client->socket, &fds);
self_pipe_fd_set( client->stop_signal, &fds ); self_pipe_fd_set( client->stop_signal, &fds );
fd_count = select(FD_SETSIZE, &fds, NULL, NULL, ptv); fd_count = sock_try_select(FD_SETSIZE, &fds, NULL, NULL, ptv);
if ( fd_count == 0 ) { if ( fd_count == 0 ) {
/* This "can't ever happen" */ /* This "can't ever happen" */
if ( NULL == ptv ) { fatal( "No FDs selected, and no timeout!" ); } if ( NULL == ptv ) { fatal( "No FDs selected, and no timeout!" ); }
@@ -242,8 +269,9 @@ int fd_write_reply( int fd, char *handle, int error )
memcpy( reply.handle, handle, 8 ); memcpy( reply.handle, handle, 8 );
nbd_h2r_reply( &reply, &reply_raw ); nbd_h2r_reply( &reply, &reply_raw );
debug( "Replying with %s, %d", handle, error );
if( -1 == write( fd, &reply_raw, sizeof( reply_raw ) ) ) { if( -1 == writeloop( fd, &reply_raw, sizeof( reply_raw ) ) ) {
switch( errno ) { switch( errno ) {
case ECONNRESET: case ECONNRESET:
error( "Connection reset while writing reply" ); error( "Connection reset while writing reply" );
@@ -341,46 +369,46 @@ void client_flush( struct client * client, size_t len )
* Returns 1 if we do, 0 otherwise. * Returns 1 if we do, 0 otherwise.
* request_err is set to 0 if the client sent a bad request, in which * request_err is set to 0 if the client sent a bad request, in which
* case we drop the connection. * case we drop the connection.
* FIXME: after an ENTRUST, there's no way to distinguish between a
* DISCONNECT and any bad request.
*/ */
int client_request_needs_reply( struct client * client, int client_request_needs_reply( struct client * client,
struct nbd_request request ) struct nbd_request request )
{ {
debug("request type %d", request.type); /* The client is stupid, but don't take down the whole server as a result.
* We send a reply before disconnecting so that at least some indication of
* the problem is visible, and so proxies don't retry the same (bad) request
* forever.
*/
if (request.magic != REQUEST_MAGIC) { if (request.magic != REQUEST_MAGIC) {
fatal("Bad magic %08x", request.magic); warn("Bad magic 0x%08x from client", request.magic);
client_write_reply( client, &request, EBADMSG );
client->disconnect = 1; // no need to flush
return 0;
} }
debug(
"request type=%"PRIu32", from=%"PRIu64", len=%"PRIu32,
request.type, request.from, request.len
);
/* check it's not out of range */
if ( request.from+request.len > client->serve->size) {
warn("write request %"PRIu64"+%"PRIu32" out of range",
request.from, request.len
);
if ( request.type == REQUEST_WRITE ) {
client_flush( client, request.len );
}
client_write_reply( client, &request, EPERM ); /* TODO: Change to ERANGE ? */
client->disconnect = 0;
return 0;
}
switch (request.type) switch (request.type)
{ {
case REQUEST_READ: case REQUEST_READ:
ERROR_IF( client->entrusted,
"Received a read request "
"after an entrust message.");
break; break;
case REQUEST_WRITE: case REQUEST_WRITE:
ERROR_IF( client->entrusted,
"Received a write request "
"after an entrust message.");
/* check it's not out of range */
if ( request.from+request.len > client->serve->size) {
warn("write request %d+%d out of range",
request.from,
request.len
);
client_write_reply( client, &request, 1 );
client_flush( client, request.len );
client->disconnect = 0;
return 0;
}
break;
case REQUEST_ENTRUST:
/* Yes, we need to reply to an entrust, but we take no
* further action */
debug("request entrust");
break; break;
case REQUEST_DISCONNECT: case REQUEST_DISCONNECT:
debug("request disconnect"); debug("request disconnect");
@@ -394,19 +422,6 @@ int client_request_needs_reply( struct client * client,
} }
void client_reply_to_entrust( struct client * client, struct nbd_request request )
{
/* An entrust needs a response, but has no data. */
debug( "request entrust" );
client_write_reply( client, &request, 0 );
/* We set this after trying to send the reply, so we know the
* reply got away safely.
*/
client->entrusted = 1;
}
void client_reply_to_read( struct client* client, struct nbd_request request ) void client_reply_to_read( struct client* client, struct nbd_request request )
{ {
off64_t offset; off64_t offset;
@@ -434,12 +449,12 @@ void client_reply_to_read( struct client* client, struct nbd_request request )
void client_reply_to_write( struct client* client, struct nbd_request request ) void client_reply_to_write( struct client* client, struct nbd_request request )
{ {
debug("request write %ld+%d", request.from, request.len); debug("request write %ld+%d", request.from, request.len);
if (client->serve->allocation_map) { if (client->serve->allocation_map_built) {
write_not_zeroes( client, request.from, request.len ); write_not_zeroes( client, request.from, request.len );
} }
else { else {
debug("No allocation map, writing directly."); debug("No allocation map, writing directly.");
/* If we get cut off partway through reading this data /* If we get cut off partway through reading this data:
* */ * */
ERROR_IF_NEGATIVE( ERROR_IF_NEGATIVE(
readloop( client->socket, readloop( client->socket,
@@ -449,7 +464,12 @@ void client_reply_to_write( struct client* client, struct nbd_request request )
request.from, request.from,
request.len request.len
); );
server_dirty(client->serve, request.from, request.len);
/* the allocation_map is shared between client threads, and may be
* being built. We need to reflect the write in it, as it may be in
* a position the builder has already gone over.
*/
bitset_set_range(client->serve->allocation_map, request.from, request.len);
} }
if (1) /* not sure whether this is necessary... */ if (1) /* not sure whether this is necessary... */
@@ -461,7 +481,7 @@ void client_reply_to_write( struct client* client, struct nbd_request request )
FATAL_IF_NEGATIVE( FATAL_IF_NEGATIVE(
msync( client->mapped + from_rounded, msync( client->mapped + from_rounded,
len_rounded, len_rounded,
MS_SYNC), MS_SYNC | MS_INVALIDATE),
"msync failed %ld %ld", request.from, request.len "msync failed %ld %ld", request.from, request.len
); );
} }
@@ -478,36 +498,91 @@ void client_reply( struct client* client, struct nbd_request request )
case REQUEST_WRITE: case REQUEST_WRITE:
client_reply_to_write( client, request ); client_reply_to_write( client, request );
break; break;
case REQUEST_ENTRUST:
client_reply_to_entrust( client, request );
break;
} }
} }
/* Starts a timer that will kill the whole process if disarm is not called
* within a timeout (see CLIENT_HANDLE_TIMEOUT).
*/
void client_arm_killswitch( struct client* client )
{
struct itimerspec its = {
.it_value = { .tv_nsec = 0, .tv_sec = CLIENT_HANDLER_TIMEOUT },
.it_interval = { .tv_nsec = 0, .tv_sec = 0 }
};
if ( !client->serve->use_killswitch ) {
return;
}
debug( "Arming killswitch" );
FATAL_IF_NEGATIVE(
timer_settime( client->killswitch, 0, &its, NULL ),
SHOW_ERRNO( "Failed to arm killswitch" )
);
return;
}
void client_disarm_killswitch( struct client* client )
{
struct itimerspec its = {
.it_value = { .tv_nsec = 0, .tv_sec = 0 },
.it_interval = { .tv_nsec = 0, .tv_sec = 0 }
};
if ( !client->serve->use_killswitch ) {
return;
}
debug( "Disarming killswitch" );
FATAL_IF_NEGATIVE(
timer_settime( client->killswitch, 0, &its, NULL ),
SHOW_ERRNO( "Failed to disarm killswitch" )
);
return;
}
/* Returns 0 if we should continue trying to serve requests */ /* Returns 0 if we should continue trying to serve requests */
int client_serve_request(struct client* client) int client_serve_request(struct client* client)
{ {
struct nbd_request request = {0}; struct nbd_request request = {0};
int failure = 1; int stop = 1;
int disconnected = 0; int disconnected = 0;
if ( !client_read_request( client, &request, &disconnected ) ) { return failure; } if ( !client_read_request( client, &request, &disconnected ) ) { return stop; }
if ( disconnected ) { return failure; } if ( disconnected ) { return stop; }
if ( !client_request_needs_reply( client, request ) ) { if ( !client_request_needs_reply( client, request ) ) {
return client->disconnect; return client->disconnect;
} }
server_lock_io( client->serve );
{ {
if ( !server_is_closed( client->serve ) ) { if ( !server_is_closed( client->serve ) ) {
/* We arm / disarm around client_reply() to catch cases where the
* remote peer sends part of a write request data before dying,
* and cases where we send part of read reply data before they die.
*
* That last is theoretical right now, but could break us in the
* same way as a half-write (which causes us to sit in read forever)
*
* We only arm/disarm inside the server io lock because it's common
* during migrations for us to be hanging on that mutex for quite
* a while while the final pass happens - it's held for the entire
* time.
*/
client_arm_killswitch( client );
client_reply( client, request ); client_reply( client, request );
failure = 0; client_disarm_killswitch( client );
stop = 0;
} }
} }
server_unlock_io( client->serve );
return failure;
return stop;
} }
@@ -521,13 +596,24 @@ void client_cleanup(struct client* client,
{ {
info("client cleanup for client %p", client); info("client cleanup for client %p", client);
if (client->socket) { close(client->socket); } if (client->socket) {
FATAL_IF_NEGATIVE( close(client->socket),
"Error closing client socket %d",
client->socket );
debug("Closed client socket fd %d", client->socket);
client->socket = -1;
}
if (client->mapped) { if (client->mapped) {
munmap(client->mapped, client->serve->size); munmap(client->mapped, client->serve->size);
} }
if (client->fileno) { close(client->fileno); } if (client->fileno) {
FATAL_IF_NEGATIVE( close(client->fileno),
"Error closing file %d",
client->fileno );
debug("Closed client file fd %d", client->fileno );
client->fileno = -1;
}
if ( server_io_locked( client->serve ) ) { server_unlock_io( client->serve ); }
if ( server_acl_locked( client->serve ) ) { server_unlock_acl( client->serve ); } if ( server_acl_locked( client->serve ) ) { server_unlock_acl( client->serve ); }
} }
@@ -548,6 +634,13 @@ void* client_serve(void* client_uncast)
), ),
"Couldn't open/mmap file %s: %s", client->serve->filename, strerror( errno ) "Couldn't open/mmap file %s: %s", client->serve->filename, strerror( errno )
); );
FATAL_IF_NEGATIVE(
madvise( client->mapped, client->serve->size, MADV_RANDOM ),
SHOW_ERRNO( "Failed to madvise() %s", client->serve->filename )
);
debug( "Opened client file fd %d", client->fileno);
debug("client: sending hello"); debug("client: sending hello");
client_send_hello(client); client_send_hello(client);
@@ -557,21 +650,10 @@ void* client_serve(void* client_uncast)
debug("client: stopped serving requests"); debug("client: stopped serving requests");
client->stopped = 1; client->stopped = 1;
if ( client->entrusted ) {
if ( client->disconnect ){ if ( client->disconnect ){
debug("client: control arrived" ); debug("client: control arrived" );
server_control_arrived( client->serve ); server_control_arrived( client->serve );
} }
else {
warn( "client: control transfer failed." );
}
}
FATAL_IF_NEGATIVE(
close(client->socket),
"Couldn't close socket %d",
client->socket
);
debug("Cleaning client %p up normally in thread %p", client, pthread_self()); debug("Cleaning client %p up normally in thread %p", client, pthread_self());
client_cleanup(client, 0); client_cleanup(client, 0);
@@ -579,3 +661,4 @@ void* client_serve(void* client_uncast)
return NULL; return NULL;
} }

View File

@@ -1,6 +1,11 @@
#ifndef CLIENT_H #ifndef CLIENT_H
#define CLIENT_H #define CLIENT_H
#include <signal.h>
#include <time.h>
#ifdef HAS_LISTEN_TIMEOUT
/** CLIENT_MAX_WAIT_SECS /** CLIENT_MAX_WAIT_SECS
* This is the length of time an inbound migration will wait for a fresh * This is the length of time an inbound migration will wait for a fresh
* write before assuming the source has Gone Away. Note: it is *not* * write before assuming the source has Gone Away. Note: it is *not*
@@ -9,6 +14,21 @@
*/ */
#define CLIENT_MAX_WAIT_SECS 5 #define CLIENT_MAX_WAIT_SECS 5
#endif
/** CLIENT_HANDLER_TIMEOUT
* This is the length of time (in seconds) any request can be outstanding for.
* If we spend longer than this in a request, the whole server is killed.
*/
#define CLIENT_HANDLER_TIMEOUT 120
/** CLIENT_KILLSWITCH_SIGNAL
* The signal number we use to kill the server when *any* killswitch timer
* fires. We don't actually need to install a signal handler for it, the default
* behaviour is perfectly fine.
*/
#define CLIENT_KILLSWITCH_SIGNAL ( SIGRTMIN + 1 )
struct client { struct client {
/* When we call pthread_join, if the thread is already dead /* When we call pthread_join, if the thread is already dead
@@ -28,11 +48,14 @@ struct client {
struct server* serve; /* FIXME: remove above duplication */ struct server* serve; /* FIXME: remove above duplication */
/* Have we seen a REQUEST_ENTRUST message? */
int entrusted;
/* Have we seen a REQUEST_DISCONNECT message? */ /* Have we seen a REQUEST_DISCONNECT message? */
int disconnect; int disconnect;
/* kill the whole server if a request has been outstanding too long,
* assuming use_killswitch is set in serve
*/
timer_t killswitch;
}; };
@@ -42,3 +65,4 @@ void client_destroy( struct client * client );
void client_signal_stop( struct client * client ); void client_signal_stop( struct client * client );
#endif #endif

View File

@@ -54,6 +54,7 @@ struct control * control_create(
control->flexnbd = flexnbd; control->flexnbd = flexnbd;
control->socket_name = csn; control->socket_name = csn;
control->open_signal = self_pipe_create();
control->close_signal = self_pipe_create(); control->close_signal = self_pipe_create();
control->mirror_state_mbox = mbox_create(); control->mirror_state_mbox = mbox_create();
@@ -75,6 +76,7 @@ void control_destroy( struct control * control )
mbox_destroy( control->mirror_state_mbox ); mbox_destroy( control->mirror_state_mbox );
self_pipe_destroy( control->close_signal ); self_pipe_destroy( control->close_signal );
self_pipe_destroy( control->open_signal );
free( control ); free( control );
} }
@@ -205,10 +207,23 @@ void control_listen(struct control* control)
control->control_fd = open_control_socket( control->socket_name ); control->control_fd = open_control_socket( control->socket_name );
} }
void control_wait_for_open_signal( struct control * control )
{
fd_set fds;
FD_ZERO( &fds );
self_pipe_fd_set( control->open_signal, &fds );
FATAL_IF_NEGATIVE( select( FD_SETSIZE, &fds, NULL, NULL, NULL ),
"select() failed" );
self_pipe_signal_clear( control->open_signal );
}
void control_serve( struct control * control ) void control_serve( struct control * control )
{ {
NULLCHECK( control ); NULLCHECK( control );
control_wait_for_open_signal( control );
control_listen( control ); control_listen( control );
while( control_accept( control ) ); while( control_accept( control ) );
} }
@@ -235,7 +250,7 @@ void * control_runner( void * control_uncast )
control_serve( control ); control_serve( control );
control_cleanup( control, 0 ); control_cleanup( control, 0 );
return NULL; pthread_exit( NULL );
} }
@@ -260,6 +275,9 @@ void control_write_mirror_response( enum mirror_state mirror_state, int client_f
case MS_FAIL_SIZE_MISMATCH: case MS_FAIL_SIZE_MISMATCH:
write_socket( "1: Remote size does not match local size" ); write_socket( "1: Remote size does not match local size" );
break; break;
case MS_ABANDONED:
write_socket( "1: Mirroring abandoned" );
break;
case MS_GO: case MS_GO:
case MS_DONE: /* Yes, I know we know better, but it's simpler this way */ case MS_DONE: /* Yes, I know we know better, but it's simpler this way */
write_socket( "0: Mirror started" ); write_socket( "0: Mirror started" );
@@ -292,7 +310,6 @@ enum mirror_state control_client_mirror_wait(
return mirror_state; return mirror_state;
} }
#define write_socket(msg) write(client->socket, (msg "\n"), strlen((msg))+1) #define write_socket(msg) write(client->socket, (msg "\n"), strlen((msg))+1)
/** Command parser to start mirror process from socket input */ /** Command parser to start mirror process from socket input */
int control_mirror(struct control_client* client, int linesc, char** lines) int control_mirror(struct control_client* client, int linesc, char** lines)
@@ -302,7 +319,7 @@ int control_mirror(struct control_client* client, int linesc, char** lines)
struct flexnbd * flexnbd = client->flexnbd; struct flexnbd * flexnbd = client->flexnbd;
union mysockaddr *connect_to = xmalloc( sizeof( union mysockaddr ) ); union mysockaddr *connect_to = xmalloc( sizeof( union mysockaddr ) );
union mysockaddr *connect_from = NULL; union mysockaddr *connect_from = NULL;
uint64_t max_Bps = 0; uint64_t max_Bps = UINT64_MAX;
int action_at_finish; int action_at_finish;
int raw_port; int raw_port;
@@ -324,22 +341,15 @@ int control_mirror(struct control_client* client, int linesc, char** lines)
} }
connect_to->v4.sin_port = htobe16(raw_port); connect_to->v4.sin_port = htobe16(raw_port);
action_at_finish = ACTION_EXIT;
if (linesc > 2) { if (linesc > 2) {
connect_from = xmalloc( sizeof( union mysockaddr ) ); if (strcmp("exit", lines[2]) == 0) {
if (parse_ip_to_sockaddr(&connect_from->generic, lines[2]) == 0) {
write_socket("1: bad bind address");
return -1;
}
}
if (linesc > 3) { max_Bps = atoi(lines[2]); }
action_at_finish = ACTION_EXIT;
if (linesc > 4) {
if (strcmp("exit", lines[3]) == 0) {
action_at_finish = ACTION_EXIT; action_at_finish = ACTION_EXIT;
} }
else if (strcmp("nothing", lines[3]) == 0) { else if (strcmp( "unlink", lines[2]) == 0 ) {
action_at_finish = ACTION_UNLINK;
}
else if (strcmp("nothing", lines[2]) == 0) {
action_at_finish = ACTION_NOTHING; action_at_finish = ACTION_NOTHING;
} }
else { else {
@@ -348,19 +358,37 @@ int control_mirror(struct control_client* client, int linesc, char** lines)
} }
} }
if (linesc > 3) {
connect_from = xmalloc( sizeof( union mysockaddr ) );
if (parse_ip_to_sockaddr(&connect_from->generic, lines[3]) == 0) {
write_socket("1: bad bind address");
return -1;
}
}
if (linesc > 4) {
errno = 0;
max_Bps = strtoull( lines[4], NULL, 10 );
if ( errno == ERANGE ) {
write_socket( "1: max_bps out of range" );
return -1;
} else if ( errno != 0 ) {
write_socket( "1: max_bps couldn't be parsed" );
return -1;
}
}
if (linesc > 5) { if (linesc > 5) {
write_socket("1: unrecognised parameters to mirror"); write_socket("1: unrecognised parameters to mirror");
return -1; return -1;
} }
/* In theory, we should never have to worry about the switch
* lock here, since we should never be able to start more than
* one mirror at a time. This is enforced by only accepting a
* single client at a time on the control socket.
*/
flexnbd_lock_switch( flexnbd );
{
struct server * serve = flexnbd_server(flexnbd); struct server * serve = flexnbd_server(flexnbd);
server_lock_start_mirror( serve );
{
if ( server_mirror_can_start( serve ) ) {
serve->mirror_super = mirror_super_create( serve->mirror_super = mirror_super_create(
serve->filename, serve->filename,
connect_to, connect_to,
@@ -369,7 +397,24 @@ int control_mirror(struct control_client* client, int linesc, char** lines)
action_at_finish, action_at_finish,
client->mirror_state_mbox ); client->mirror_state_mbox );
serve->mirror = serve->mirror_super->mirror; serve->mirror = serve->mirror_super->mirror;
server_prevent_mirror_start( serve );
} else {
if ( serve->mirror_super ) {
warn( "Tried to start a second mirror run" );
write_socket( "1: mirror already running" );
} else {
warn( "Cannot start mirroring, shutting down" );
write_socket( "1: shutting down" );
}
}
}
server_unlock_start_mirror( serve );
/* Do this outside the lock to minimise the length of time the
* sighandler can block the serve thread
*/
if ( serve->mirror_super ) {
FATAL_IF( 0 != pthread_create( FATAL_IF( 0 != pthread_create(
&serve->mirror_super->thread, &serve->mirror_super->thread,
NULL, NULL,
@@ -385,15 +430,47 @@ int control_mirror(struct control_client* client, int linesc, char** lines)
debug("Control thread writing response"); debug("Control thread writing response");
control_write_mirror_response( state, client->socket ); control_write_mirror_response( state, client->socket );
} }
debug( "Control thread unlocking switch" );
flexnbd_unlock_switch( flexnbd );
debug( "Control thread going away." ); debug( "Control thread going away." );
return 0; return 0;
} }
#undef write_socket int control_mirror_max_bps( struct control_client* client, int linesc, char** lines )
{
NULLCHECK( client );
NULLCHECK( client->flexnbd );
struct server* serve = flexnbd_server( client->flexnbd );
uint64_t max_Bps;
if ( !serve->mirror_super ) {
write_socket( "1: Not currently mirroring" );
return -1;
}
if ( linesc != 1 ) {
write_socket( "1: Bad format" );
return -1;
}
errno = 0;
max_Bps = strtoull( lines[0], NULL, 10 );
if ( errno == ERANGE ) {
write_socket( "1: max_bps out of range" );
return -1;
} else if ( errno != 0 ) {
write_socket( "1: max_bps couldn't be parsed" );
return -1;
}
serve->mirror->max_bytes_per_second = max_Bps;
write_socket( "0: updated" );
return 0;
}
#undef write_socket
/** Command parser to alter access control list from socket input */ /** Command parser to alter access control list from socket input */
int control_acl(struct control_client* client, int linesc, char** lines) int control_acl(struct control_client* client, int linesc, char** lines)
@@ -406,6 +483,7 @@ int control_acl(struct control_client* client, int linesc, char** lines)
struct acl * new_acl = acl_create( linesc, lines, default_deny ); struct acl * new_acl = acl_create( linesc, lines, default_deny );
if (new_acl->len != linesc) { if (new_acl->len != linesc) {
warn("Bad ACL spec: %s", lines[new_acl->len] );
write(client->socket, "1: bad spec: ", 13); write(client->socket, "1: bad spec: ", 13);
write(client->socket, lines[new_acl->len], write(client->socket, lines[new_acl->len],
strlen(lines[new_acl->len])); strlen(lines[new_acl->len]));
@@ -414,12 +492,59 @@ int control_acl(struct control_client* client, int linesc, char** lines)
} }
else { else {
flexnbd_replace_acl( flexnbd, new_acl ); flexnbd_replace_acl( flexnbd, new_acl );
write( client->socket, "0: updated", 10); info("ACL set");
write( client->socket, "0: updated\n", 11);
} }
return 0; return 0;
} }
int control_break(
struct control_client* client,
int linesc __attribute__ ((unused)),
char** lines __attribute__((unused))
)
{
NULLCHECK( client );
NULLCHECK( client->flexnbd );
int result = 0;
struct flexnbd* flexnbd = client->flexnbd;
struct server * serve = flexnbd_server( flexnbd );
server_lock_start_mirror( serve );
{
if ( server_is_mirroring( serve ) ) {
info( "Signaling to abandon mirror" );
server_abandon_mirror( serve );
debug( "Abandon signaled" );
if ( server_is_closed( serve ) ) {
info( "Mirror completed while canceling" );
write( client->socket,
"1: mirror completed\n", 20 );
}
else {
info( "Mirror successfully stopped." );
write( client->socket,
"0: mirror stopped\n", 18 );
result = 1;
}
} else {
warn( "Not mirroring." );
write( client->socket, "1: not mirroring\n", 17 );
}
}
server_unlock_start_mirror( serve );
return result;
}
/** FIXME: add some useful statistics */ /** FIXME: add some useful statistics */
int control_status( int control_status(
struct control_client* client, struct control_client* client,
@@ -444,9 +569,7 @@ void control_client_cleanup(struct control_client* client,
if (client->socket) { close(client->socket); } if (client->socket) { close(client->socket); }
/* This is wrongness */ /* This is wrongness */
if ( server_io_locked( client->flexnbd->serve ) ) { server_unlock_io( client->flexnbd->serve ); }
if ( server_acl_locked( client->flexnbd->serve ) ) { server_unlock_acl( client->flexnbd->serve ); } if ( server_acl_locked( client->flexnbd->serve ) ) { server_unlock_acl( client->flexnbd->serve ); }
if ( flexnbd_switch_locked( client->flexnbd ) ) { flexnbd_unlock_switch( client->flexnbd ); }
control_client_destroy( client ); control_client_destroy( client );
} }
@@ -478,11 +601,22 @@ void control_respond(struct control_client * client)
debug("mirror command failed"); debug("mirror command failed");
} }
} }
else if (strcmp(lines[0], "break") == 0) {
info( "break command received" );
if ( control_break( client, linesc-1, lines+1) < 0) {
debug( "break command failed" );
}
}
else if (strcmp(lines[0], "status") == 0) { else if (strcmp(lines[0], "status") == 0) {
info("status command received" ); info("status command received" );
if (control_status(client, linesc-1, lines+1) < 0) { if (control_status(client, linesc-1, lines+1) < 0) {
debug("status command failed"); debug("status command failed");
} }
} else if ( strcmp( lines[0], "mirror_max_bps" ) == 0 ) {
info( "mirror_max_bps command received" );
if( control_mirror_max_bps( client, linesc-1, lines+1 ) < 0 ) {
debug( "mirror_max_bps command failed" );
}
} }
else { else {
write(client->socket, "10: unknown command\n", 23); write(client->socket, "10: unknown command\n", 23);

View File

@@ -1,10 +1,14 @@
#ifndef CONTROL_H #ifndef CONTROL_H
#define CONTROL_H #define CONTROL_H
/* We need this to avoid a complaint about struct server * in
* void accept_control_connection
*/
struct server;
#include "parse.h" #include "parse.h"
#include "mirror.h" #include "mirror.h"
#include "control.h" #include "serve.h"
#include "flexnbd.h" #include "flexnbd.h"
#include "mbox.h" #include "mbox.h"
@@ -15,6 +19,7 @@ struct control {
pthread_t thread; pthread_t thread;
struct self_pipe * open_signal;
struct self_pipe * close_signal; struct self_pipe * close_signal;
/* This is owned by the control object, and used by a /* This is owned by the control object, and used by a

View File

@@ -21,7 +21,6 @@
#include "flexnbd.h" #include "flexnbd.h"
#include "serve.h" #include "serve.h"
#include "listen.h"
#include "util.h" #include "util.h"
#include "control.h" #include "control.h"
#include "status.h" #include "status.h"
@@ -76,8 +75,6 @@ void flexnbd_create_shared(
} }
flexnbd->signal_fd = flexnbd_build_signal_fd(); flexnbd->signal_fd = flexnbd_build_signal_fd();
flexnbd->switch_mutex = flexthread_mutex_create();
} }
@@ -89,7 +86,8 @@ struct flexnbd * flexnbd_create_serving(
int default_deny, int default_deny,
int acl_entries, int acl_entries,
char** s_acl_entries, char** s_acl_entries,
int max_nbd_clients) int max_nbd_clients,
int use_killswitch)
{ {
struct flexnbd * flexnbd = xmalloc( sizeof( struct flexnbd ) ); struct flexnbd * flexnbd = xmalloc( sizeof( struct flexnbd ) );
flexnbd->serve = server_create( flexnbd->serve = server_create(
@@ -101,42 +99,37 @@ struct flexnbd * flexnbd_create_serving(
acl_entries, acl_entries,
s_acl_entries, s_acl_entries,
max_nbd_clients, max_nbd_clients,
use_killswitch,
1); 1);
flexnbd_create_shared( flexnbd, s_ctrl_sock ); flexnbd_create_shared( flexnbd,
s_ctrl_sock );
return flexnbd; return flexnbd;
} }
struct flexnbd * flexnbd_create_listening( struct flexnbd * flexnbd_create_listening(
char* s_ip_address, char* s_ip_address,
char* s_rebind_ip_address,
char* s_port, char* s_port,
char* s_rebind_port,
char* s_file, char* s_file,
char* s_ctrl_sock, char* s_ctrl_sock,
int default_deny, int default_deny,
int acl_entries, int acl_entries,
char** s_acl_entries, char** s_acl_entries )
int max_nbd_clients )
{ {
struct flexnbd * flexnbd = xmalloc( sizeof( struct flexnbd ) ); struct flexnbd * flexnbd = xmalloc( sizeof( struct flexnbd ) );
flexnbd->listen = listen_create( flexnbd->serve = server_create(
flexnbd, flexnbd,
s_ip_address, s_ip_address,
s_rebind_ip_address,
s_port, s_port,
s_rebind_port,
s_file, s_file,
default_deny, default_deny,
acl_entries, acl_entries,
s_acl_entries, s_acl_entries,
max_nbd_clients); 1, 0, 0);
flexnbd->serve = flexnbd->listen->init_serve;
flexnbd_create_shared( flexnbd, s_ctrl_sock ); flexnbd_create_shared( flexnbd, s_ctrl_sock );
return flexnbd; return flexnbd;
} }
void flexnbd_spawn_control(struct flexnbd * flexnbd ) void flexnbd_spawn_control(struct flexnbd * flexnbd )
{ {
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
@@ -158,8 +151,10 @@ void flexnbd_stop_control( struct flexnbd * flexnbd )
NULLCHECK( flexnbd->control ); NULLCHECK( flexnbd->control );
control_signal_close( flexnbd->control ); control_signal_close( flexnbd->control );
FATAL_UNLESS( 0 == pthread_join( flexnbd->control->thread, NULL ), pthread_t tid = flexnbd->control->thread;
FATAL_UNLESS( 0 == pthread_join( tid, NULL ),
"Failed joining the control thread" ); "Failed joining the control thread" );
debug( "Control thread %p pthread_join returned", tid );
} }
@@ -175,53 +170,23 @@ void flexnbd_destroy( struct flexnbd * flexnbd )
if ( flexnbd->control ) { if ( flexnbd->control ) {
control_destroy( flexnbd->control ); control_destroy( flexnbd->control );
} }
if ( flexnbd->listen ) {
listen_destroy( flexnbd->listen );
}
flexthread_mutex_destroy( flexnbd->switch_mutex );
close( flexnbd->signal_fd ); close( flexnbd->signal_fd );
free( flexnbd ); free( flexnbd );
} }
/* THOU SHALT NOT DEREFERENCE flexnbd->serve OUTSIDE A SWITCH LOCK
*/
void flexnbd_lock_switch( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
flexthread_mutex_lock( flexnbd->switch_mutex );
}
void flexnbd_unlock_switch( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
flexthread_mutex_unlock( flexnbd->switch_mutex );
}
int flexnbd_switch_locked( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
return flexthread_mutex_held( flexnbd->switch_mutex );
}
struct server * flexnbd_server( struct flexnbd * flexnbd ) struct server * flexnbd_server( struct flexnbd * flexnbd )
{ {
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
return flexnbd->serve; return flexnbd->serve;
} }
void flexnbd_replace_acl( struct flexnbd * flexnbd, struct acl * acl ) void flexnbd_replace_acl( struct flexnbd * flexnbd, struct acl * acl )
{ {
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
flexnbd_lock_switch( flexnbd );
{
server_replace_acl( flexnbd_server(flexnbd), acl ); server_replace_acl( flexnbd_server(flexnbd), acl );
} }
flexnbd_unlock_switch( flexnbd );
}
struct status * flexnbd_status_create( struct flexnbd * flexnbd ) struct status * flexnbd_status_create( struct flexnbd * flexnbd )
@@ -229,16 +194,10 @@ struct status * flexnbd_status_create( struct flexnbd * flexnbd )
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
struct status * status; struct status * status;
flexnbd_lock_switch( flexnbd );
{
status = status_create( flexnbd_server( flexnbd ) ); status = status_create( flexnbd_server( flexnbd ) );
}
flexnbd_unlock_switch( flexnbd );
return status; return status;
} }
/** THOU SHALT *ONLY* CALL THIS FROM INSIDE A SWITCH LOCK
*/
void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve ) void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve )
{ {
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
@@ -246,40 +205,21 @@ void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve )
} }
/* Calls the given callback to exchange server objects, then sets /* Get the default_deny of the current server object. */
* flexnbd->server so everything else can see it. */
void flexnbd_switch( struct flexnbd * flexnbd, struct server *(listen_cb)(struct listen *) )
{
NULLCHECK( flexnbd );
NULLCHECK( flexnbd->listen );
flexnbd_lock_switch( flexnbd );
{
struct server * new_server = listen_cb( flexnbd->listen );
NULLCHECK( new_server );
flexnbd_set_server( flexnbd, new_server );
}
flexnbd_unlock_switch( flexnbd );
}
/* Get the default_deny of the current server object. This takes the
* switch_lock to avoid nastiness if the server switches and gets freed
* in the dereference chain.
* This means that this function must not be called if the switch lock
* is already held.
*/
int flexnbd_default_deny( struct flexnbd * flexnbd ) int flexnbd_default_deny( struct flexnbd * flexnbd )
{ {
int result;
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
flexnbd_lock_switch( flexnbd ); return server_default_deny( flexnbd->serve );
{
result = server_default_deny( flexnbd->serve );
} }
flexnbd_unlock_switch( flexnbd );
return result;
void make_writable( const char * filename )
{
NULLCHECK( filename );
FATAL_IF_NEGATIVE( chmod( filename, S_IWUSR ),
"Couldn't chmod %s: %s",
filename,
strerror( errno ) );
} }
@@ -287,22 +227,16 @@ int flexnbd_serve( struct flexnbd * flexnbd )
{ {
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
int success; int success;
struct self_pipe * open_signal = NULL;
if ( flexnbd->control ){ if ( flexnbd->control ){
debug( "Spawning control thread" ); debug( "Spawning control thread" );
flexnbd_spawn_control( flexnbd ); flexnbd_spawn_control( flexnbd );
open_signal = flexnbd->control->open_signal;
} }
if ( flexnbd->listen ){ success = do_serve( flexnbd->serve, open_signal );
success = do_listen( flexnbd->listen ); debug("do_serve success is %d", success );
}
else {
do_serve( flexnbd->serve );
/* We can't tell here what the intent was. We can
* legitimately exit either in control or not.
*/
success = 1;
}
if ( flexnbd->control ) { if ( flexnbd->control ) {
debug( "Stopping control thread" ); debug( "Stopping control thread" );

View File

@@ -4,7 +4,7 @@
#include "acl.h" #include "acl.h"
#include "mirror.h" #include "mirror.h"
#include "serve.h" #include "serve.h"
#include "listen.h" #include "proxy.h"
#include "self_pipe.h" #include "self_pipe.h"
#include "mbox.h" #include "mbox.h"
#include "control.h" #include "control.h"
@@ -12,29 +12,21 @@
/* Carries the "globals". */ /* Carries the "globals". */
struct flexnbd { struct flexnbd {
/* Our serve pointer should never be dereferenced outside a
/* We always have a serve pointer, but it should never be * flexnbd_switch_lock/unlock pair.
* dereferenced outside a flexnbd_switch_lock/unlock pair.
*/ */
struct server * serve; struct server * serve;
/* We only have a listen object if the process was started in
* listen mode.
*/
struct listen * listen;
/* We only have a control object if a control socket name was /* We only have a control object if a control socket name was
* passed on the command line. * passed on the command line.
*/ */
struct control * control; struct control * control;
/* switch_mutex is the lock around dereferencing the serve
* pointer.
*/
struct flexthread_mutex * switch_mutex;
/* File descriptor for a signalfd(2) signal stream. */ /* File descriptor for a signalfd(2) signal stream. */
int signal_fd; int signal_fd;
}; };
struct flexnbd * flexnbd_create(void); struct flexnbd * flexnbd_create(void);
struct flexnbd * flexnbd_create_serving( struct flexnbd * flexnbd_create_serving(
char* s_ip_address, char* s_ip_address,
@@ -44,34 +36,30 @@ struct flexnbd * flexnbd_create_serving(
int default_deny, int default_deny,
int acl_entries, int acl_entries,
char** s_acl_entries, char** s_acl_entries,
int max_nbd_clients); int max_nbd_clients,
int use_killswitch);
struct flexnbd * flexnbd_create_listening( struct flexnbd * flexnbd_create_listening(
char* s_ip_address, char* s_ip_address,
char* s_rebind_ip_address,
char* s_port, char* s_port,
char* s_rebind_port,
char* s_file, char* s_file,
char* s_ctrl_sock, char* s_ctrl_sock,
int default_deny, int default_deny,
int acl_entries, int acl_entries,
char** s_acl_entries, char** s_acl_entries );
int max_nbd_clients );
void flexnbd_destroy( struct flexnbd * ); void flexnbd_destroy( struct flexnbd * );
enum mirror_state; enum mirror_state;
enum mirror_state flexnbd_get_mirror_state( struct flexnbd * ); enum mirror_state flexnbd_get_mirror_state( struct flexnbd * );
void flexnbd_lock_switch( struct flexnbd * );
void flexnbd_unlock_switch( struct flexnbd * );
int flexnbd_switch_locked( struct flexnbd * );
int flexnbd_default_deny( struct flexnbd * ); int flexnbd_default_deny( struct flexnbd * );
void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve ); void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve );
void flexnbd_switch( struct flexnbd * flexnbd, struct server *(listen_cb)(struct listen *) );
int flexnbd_signal_fd( struct flexnbd * flexnbd ); int flexnbd_signal_fd( struct flexnbd * flexnbd );
int flexnbd_serve( struct flexnbd * flexnbd ); int flexnbd_serve( struct flexnbd * flexnbd );
int flexnbd_proxy( struct flexnbd * flexnbd );
struct server * flexnbd_server( struct flexnbd * flexnbd ); struct server * flexnbd_server( struct flexnbd * flexnbd );
void flexnbd_replace_acl( struct flexnbd * flexnbd, struct acl * acl ); void flexnbd_replace_acl( struct flexnbd * flexnbd, struct acl * acl );
struct status * flexnbd_status_create( struct flexnbd * flexnbd ); struct status * flexnbd_status_create( struct flexnbd * flexnbd );
#endif #endif

View File

@@ -10,84 +10,69 @@
#include "util.h" #include "util.h"
#include "bitset.h" #include "bitset.h"
#include "ioutil.h"
struct bitset_mapping* build_allocation_map(int fd, uint64_t size, int resolution)
int build_allocation_map(struct bitset * allocation_map, int fd)
{ {
/* break blocking ioctls down */
const unsigned long max_length = 100*1024*1024;
const unsigned int max_extents = 1000;
unsigned long offset = 0;
struct {
struct fiemap fiemap;
struct fiemap_extent extents[max_extents];
} fiemap_static;
struct fiemap* fiemap = (struct fiemap*) &fiemap_static;
memset(&fiemap_static, 0, sizeof(fiemap_static));
for (offset = 0; offset < allocation_map->size; ) {
unsigned int i; unsigned int i;
struct bitset_mapping* allocation_map = bitset_alloc(size, resolution);
struct fiemap *fiemap_count = NULL, *fiemap = NULL;
fiemap_count = (struct fiemap*) xmalloc(sizeof(struct fiemap)); fiemap->fm_start = offset;
fiemap_count->fm_start = 0; fiemap->fm_length = max_length;
fiemap_count->fm_length = size; if ( offset + max_length > allocation_map->size ) {
fiemap_count->fm_flags = 0; fiemap->fm_length = allocation_map->size-offset;
fiemap_count->fm_extent_count = 0;
fiemap_count->fm_mapped_extents = 0;
/* Find out how many extents there are */
if (ioctl(fd, FS_IOC_FIEMAP, fiemap_count) < 0) {
debug( "Couldn't get fiemap_count, returning no allocation_map" );
goto no_map;
} }
/* Resize fiemap to allow us to read in the extents */ fiemap->fm_flags = FIEMAP_FLAG_SYNC;
fiemap = (struct fiemap*)xmalloc( fiemap->fm_extent_count = max_extents;
sizeof(struct fiemap) + (
sizeof(struct fiemap_extent) *
fiemap_count->fm_mapped_extents
)
);
/* realloc makes valgrind complain a lot */
memcpy(fiemap, fiemap_count, sizeof(struct fiemap));
free( fiemap_count );
fiemap->fm_extent_count = fiemap->fm_mapped_extents;
fiemap->fm_mapped_extents = 0; fiemap->fm_mapped_extents = 0;
if ( ioctl( fd, FS_IOC_FIEMAP, fiemap ) < 0 ) { if ( ioctl( fd, FS_IOC_FIEMAP, fiemap ) < 0 ) {
debug( "Couldn't get fiemap, returning no allocation_map" ); debug( "Couldn't get fiemap, returning no allocation_map" );
goto no_map; return 0; /* it's up to the caller to free the map */
} }
else {
for ( i = 0; i < fiemap->fm_mapped_extents; i++ ) { for ( i = 0; i < fiemap->fm_mapped_extents; i++ ) {
bitset_set_range( bitset_set_range( allocation_map,
allocation_map,
fiemap->fm_extents[i].fe_logical, fiemap->fm_extents[i].fe_logical,
fiemap->fm_extents[i].fe_length fiemap->fm_extents[i].fe_length );
);
}
/* This is pointlessly verbose for real discs, it's here as a
* reference for pulling data out of the allocation map */
if ( 0 ) {
for (i=0; i<(size/resolution); i++) {
debug("map[%d] = %d%d%d%d%d%d%d%d",
i,
(allocation_map->bits[i] & 1) == 1,
(allocation_map->bits[i] & 2) == 2,
(allocation_map->bits[i] & 4) == 4,
(allocation_map->bits[i] & 8) == 8,
(allocation_map->bits[i] & 16) == 16,
(allocation_map->bits[i] & 32) == 32,
(allocation_map->bits[i] & 64) == 64,
(allocation_map->bits[i] & 128) == 128
);
}
} }
free(fiemap); /* must move the offset on, but careful not to jump max_length
* if we've actually hit max_offsets.
*/
if (fiemap->fm_mapped_extents > 0) {
struct fiemap_extent *last = &fiemap->fm_extents[
fiemap->fm_mapped_extents-1
];
offset = last->fe_logical + last->fe_length;
}
else {
offset += fiemap->fm_length;
}
}
}
debug("Successfully built allocation map"); debug("Successfully built allocation map");
return allocation_map; return 1;
no_map:
free( allocation_map );
if ( NULL != fiemap ) { free( fiemap ); }
if ( NULL != fiemap_count ) { free( fiemap_count ); }
return NULL;
} }
@@ -136,7 +121,12 @@ int writeloop(int filedes, const void *buffer, size_t size)
size_t written=0; size_t written=0;
while (written < size) { while (written < size) {
ssize_t result = write(filedes, buffer+written, size-written); ssize_t result = write(filedes, buffer+written, size-written);
if (result == -1) { return -1; } if (result == -1) {
if ( errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK ) {
continue; // busy-wait
}
return -1; // failure
}
written += result; written += result;
} }
return 0; return 0;
@@ -147,9 +137,18 @@ int readloop(int filedes, void *buffer, size_t size)
size_t readden=0; size_t readden=0;
while (readden < size) { while (readden < size) {
ssize_t result = read(filedes, buffer+readden, size-readden); ssize_t result = read(filedes, buffer+readden, size-readden);
if (result == 0 /* EOF */ || result == -1 /* error */) {
if ( result == 0 /* EOF */ ) {
warn( "end-of-file detected while reading" );
return -1; return -1;
} }
if ( result == -1 ) {
if ( errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK ) {
continue; // busy-wait
}
return -1; // failure
}
readden += result; readden += result;
} }
return 0; return 0;
@@ -162,7 +161,10 @@ int sendfileloop(int out_fd, int in_fd, off64_t *offset, size_t count)
ssize_t result = sendfile64(out_fd, in_fd, offset, count-sent); ssize_t result = sendfile64(out_fd, in_fd, offset, count-sent);
debug("sendfile64(out_fd=%d, in_fd=%d, offset=%p, count-sent=%ld) = %ld", out_fd, in_fd, offset, count-sent, result); debug("sendfile64(out_fd=%d, in_fd=%d, offset=%p, count-sent=%ld) = %ld", out_fd, in_fd, offset, count-sent, result);
if (result == -1) { return -1; } if (result == -1) {
debug( "%s (%i) calling sendfile64()", strerror(errno), errno );
return -1;
}
sent += result; sent += result;
debug("sent=%ld, count=%ld", sent, count); debug("sent=%ld, count=%ld", sent, count);
} }
@@ -280,3 +282,69 @@ int fd_is_closed( int fd_in )
errno = errno_old; errno = errno_old;
return result; return result;
} }
static inline int io_errno_permanent(void)
{
return ( errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR );
}
/* Returns -1 if the operation failed, or the number of bytes read if all is
* well. Note that 0 bytes may be returned. Unlike read(), this is not an EOF! */
ssize_t iobuf_read(int fd, struct iobuf *iobuf, size_t default_size )
{
size_t left;
ssize_t count;
if ( iobuf->needle == 0 ) {
iobuf->size = default_size;
}
left = iobuf->size - iobuf->needle;
debug( "Reading %"PRIu32" of %"PRIu32" bytes from fd %i", left, iobuf->size, fd );
count = read( fd, iobuf->buf + iobuf->needle, left );
if ( count > 0 ) {
iobuf->needle += count;
debug( "read() returned %"PRIu32" bytes", count );
} else if ( count == 0 ) {
warn( "read() returned EOF on fd %i", fd );
errno = 0;
return -1;
} else if ( count == -1 ) {
if ( io_errno_permanent() ) {
warn( SHOW_ERRNO( "read() failed on fd %i", fd ) );
} else {
debug( SHOW_ERRNO( "read() returned 0 bytes" ) );
count = 0;
}
}
return count;
}
ssize_t iobuf_write( int fd, struct iobuf *iobuf )
{
size_t left = iobuf->size - iobuf->needle;
ssize_t count;
debug( "Writing %"PRIu32" of %"PRIu32" bytes to fd %i", left, iobuf->size, fd );
count = write( fd, iobuf->buf + iobuf->needle, left );
if ( count >= 0 ) {
iobuf->needle += count;
debug( "write() returned %"PRIu32" bytes", count );
} else {
if ( io_errno_permanent() ) {
warn( SHOW_ERRNO( "write() failed on fd %i", fd ) );
} else {
debug( SHOW_ERRNO( "write() returned 0 bytes" ) );
count = 0;
}
}
return count;
}

View File

@@ -1,16 +1,26 @@
#ifndef __IOUTIL_H #ifndef __IOUTIL_H
#define __IOUTIL_H #define __IOUTIL_H
#include "serve.h" #include <sys/types.h>
struct bitset_mapping; /* don't need whole of bitset.h here */ struct iobuf {
unsigned char *buf;
size_t size;
size_t needle;
};
/** Returns a bit field representing which blocks are allocated in file ssize_t iobuf_read( int fd, struct iobuf* iobuf, size_t default_size );
* descriptor ''fd''. You must supply the size, and the resolution at which ssize_t iobuf_write( int fd, struct iobuf* iobuf );
* you want the bits to represent allocated blocks. If the OS represents
* allocated blocks at a finer resolution than you've asked for, any block #include "serve.h"
* or part block will count as "allocated" with the corresponding bit set. struct bitset; /* don't need whole of bitset.h here */
/** Scan the file opened in ''fd'', set bits in ''allocation_map'' that
* correspond to which blocks are physically allocated on disc (or part-
* allocated). If the OS represents allocated blocks at a finer resolution
* than you've asked for, any block or part block will count as "allocated"
* with the corresponding bit set. Returns 1 if successful, 0 otherwise.
*/ */
struct bitset_mapping* build_allocation_map(int fd, off64_t size, int resolution); int build_allocation_map(struct bitset * allocation_map, int fd);
/** Repeat a write() operation that succeeds partially until ''size'' bytes /** Repeat a write() operation that succeeds partially until ''size'' bytes
* are written, or an error is returned, when it returns -1 as usual. * are written, or an error is returned, when it returns -1 as usual.

View File

@@ -1,120 +0,0 @@
#include "listen.h"
#include "serve.h"
#include "util.h"
#include "flexnbd.h"
#include <stdlib.h>
struct listen * listen_create(
struct flexnbd * flexnbd,
char* s_ip_address,
char* s_rebind_ip_address,
char* s_port,
char* s_rebind_port,
char* s_file,
int default_deny,
int acl_entries,
char** s_acl_entries,
int max_nbd_clients )
{
NULLCHECK( flexnbd );
struct listen * listen;
listen = (struct listen *)xmalloc( sizeof( struct listen ) );
listen->flexnbd = flexnbd;
listen->init_serve = server_create(
flexnbd,
s_ip_address,
s_port,
s_file,
default_deny,
acl_entries,
s_acl_entries,
1, 0);
listen->main_serve = server_create(
flexnbd,
s_rebind_ip_address ? s_rebind_ip_address : s_ip_address,
s_rebind_port ? s_rebind_port : s_port,
s_file,
default_deny,
acl_entries,
s_acl_entries,
max_nbd_clients, 1);
return listen;
}
void listen_destroy( struct listen * listen )
{
NULLCHECK( listen );
free( listen );
}
struct server *listen_switch( struct listen * listen )
{
NULLCHECK( listen );
/* TODO: Copy acl from init_serve to main_serve */
/* TODO: rename underlying file from foo.INCOMPLETE to foo */
server_destroy( listen->init_serve );
listen->init_serve = NULL;
info( "Switched to the main server, serving." );
return listen->main_serve;
}
void listen_cleanup( struct listen * listen )
{
NULLCHECK( listen );
if ( flexnbd_switch_locked( listen->flexnbd ) ) {
flexnbd_unlock_switch( listen->flexnbd );
}
}
int do_listen( struct listen * listen )
{
NULLCHECK( listen );
int have_control = 0;
flexnbd_lock_switch( listen->flexnbd );
{
flexnbd_set_server( listen->flexnbd, listen->init_serve );
}
flexnbd_unlock_switch( listen->flexnbd );
/* WATCH FOR RACES HERE: flexnbd->serve is set, but the server
* isn't running yet and the switch lock is released.
*/
have_control = do_serve( listen->init_serve );
if( have_control ) {
info( "Taking control.");
flexnbd_switch( listen->flexnbd, listen_switch );
/* WATCH FOR RACES HERE: the server hasn't been
* restarted before we release the flexnbd switch lock.
* do_serve doesn't return, so there's not a lot of
* choice about that.
*/
do_serve( listen->main_serve );
}
else {
warn("Failed to take control, giving up.");
server_destroy( listen->init_serve );
listen->init_serve = NULL;
}
/* TODO: here we must signal the control thread to stop before
* it tries to */
server_destroy( listen->main_serve );
listen->main_serve = NULL;
debug("Listen done, cleaning up");
listen_cleanup( listen );
return have_control;
}

View File

@@ -1,28 +0,0 @@
#ifndef LISTEN_H
#define LISTEN_H
#include "flexnbd.h"
#include "serve.h"
struct listen {
struct flexnbd * flexnbd;
struct server * init_serve;
struct server * main_serve;
};
struct listen * listen_create(
struct flexnbd * flexnbd,
char* s_ip_address,
char* s_rebind_ip_address,
char* s_port,
char* s_rebind_port,
char* s_file,
int default_deny,
int acl_entries,
char** s_acl_entries,
int max_nbd_clients );
void listen_destroy( struct listen* );
int do_listen( struct listen * );
#endif

View File

@@ -3,7 +3,6 @@
#include <signal.h> #include <signal.h>
int main(int argc, char** argv) int main(int argc, char** argv)
{ {
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */ signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */

View File

@@ -19,23 +19,81 @@
#include "serve.h" #include "serve.h"
#include "util.h" #include "util.h"
#include "ioutil.h" #include "ioutil.h"
#include "sockutil.h"
#include "parse.h" #include "parse.h"
#include "readwrite.h" #include "readwrite.h"
#include "bitset.h" #include "bitset.h"
#include "self_pipe.h" #include "self_pipe.h"
#include "status.h" #include "status.h"
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
#include <sys/un.h> #include <sys/un.h>
#include <unistd.h> #include <unistd.h>
#include <sys/mman.h>
#include <ev.h>
/* compat with older libev */
#ifndef EVBREAK_ONE
#define ev_run( loop, flags ) ev_loop( loop, flags )
#define ev_break(loop, how) ev_unloop( loop, how )
#define EVBREAK_ONE EVUNLOOP_ONE
#define EVBREAK_ALL EVUNLOOP_ALL
#endif
/* We use this to keep track of the socket request data we need to send */
struct xfer {
/* Store the bytes we need to send before the data, or receive back */
union {
struct nbd_request_raw req_raw;
struct nbd_reply_raw rsp_raw;
} hdr;
/* what in mirror->mapped we should write, and how much of it we've done */
uint64_t from;
uint64_t len;
uint64_t written;
/* number of bytes of response read */
uint64_t read;
};
struct mirror_ctrl {
struct server *serve;
struct mirror *mirror;
/* libev stuff */
struct ev_loop *ev_loop;
ev_io read_watcher;
ev_io write_watcher;
ev_timer timeout_watcher;
ev_timer limit_watcher;
ev_io abandon_watcher;
/* We set this if the bitset stream is getting uncomfortably full, and unset
* once it's emptier */
int clear_events;
/* This is set once all clients have been closed, to let the mirror know
* it's safe to finish once the queue is empty */
int clients_closed;
/* Use this to keep track of what we're copying at any moment */
struct xfer xfer;
};
struct mirror * mirror_alloc( struct mirror * mirror_alloc(
union mysockaddr * connect_to, union mysockaddr * connect_to,
union mysockaddr * connect_from, union mysockaddr * connect_from,
int max_Bps, uint64_t max_Bps,
int action_at_finish, enum mirror_finish_action action_at_finish,
struct mbox * commit_signal) struct mbox * commit_signal)
{ {
struct mirror * mirror; struct mirror * mirror;
@@ -47,6 +105,12 @@ struct mirror * mirror_alloc(
mirror->action_at_finish = action_at_finish; mirror->action_at_finish = action_at_finish;
mirror->commit_signal = commit_signal; mirror->commit_signal = commit_signal;
mirror->commit_state = MS_UNKNOWN; mirror->commit_state = MS_UNKNOWN;
mirror->abandon_signal = self_pipe_create();
if ( mirror->abandon_signal == NULL ) {
warn( "Couldn't create mirror abandon signal" );
return NULL;
}
return mirror; return mirror;
} }
@@ -68,6 +132,8 @@ enum mirror_state mirror_get_state( struct mirror * mirror )
return mirror->commit_state; return mirror->commit_state;
} }
#define mirror_state_is( mirror, state ) mirror_get_state( mirror ) == state
void mirror_init( struct mirror * mirror, const char * filename ) void mirror_init( struct mirror * mirror, const char * filename )
{ {
@@ -88,8 +154,10 @@ void mirror_init( struct mirror * mirror, const char * filename )
filename filename
); );
mirror->dirty_map = bitset_alloc(size, 4096); FATAL_IF_NEGATIVE(
madvise( mirror->mapped, size, MADV_SEQUENTIAL ),
SHOW_ERRNO( "Failed to madvise() %s", filename )
);
} }
@@ -97,9 +165,13 @@ void mirror_init( struct mirror * mirror, const char * filename )
void mirror_reset( struct mirror * mirror ) void mirror_reset( struct mirror * mirror )
{ {
NULLCHECK( mirror ); NULLCHECK( mirror );
NULLCHECK( mirror->dirty_map );
mirror_set_state( mirror, MS_INIT ); mirror_set_state( mirror, MS_INIT );
bitset_set(mirror->dirty_map);
mirror->all_dirty = 0;
mirror->migration_started = 0;
mirror->offset = 0;
return;
} }
@@ -107,7 +179,7 @@ struct mirror * mirror_create(
const char * filename, const char * filename,
union mysockaddr * connect_to, union mysockaddr * connect_to,
union mysockaddr * connect_from, union mysockaddr * connect_from,
int max_Bps, uint64_t max_Bps,
int action_at_finish, int action_at_finish,
struct mbox * commit_signal) struct mbox * commit_signal)
{ {
@@ -131,9 +203,9 @@ struct mirror * mirror_create(
void mirror_destroy( struct mirror *mirror ) void mirror_destroy( struct mirror *mirror )
{ {
NULLCHECK( mirror ); NULLCHECK( mirror );
self_pipe_destroy( mirror->abandon_signal );
free(mirror->connect_to); free(mirror->connect_to);
free(mirror->connect_from); free(mirror->connect_from);
free(mirror->dirty_map);
free(mirror); free(mirror);
} }
@@ -150,130 +222,20 @@ static const unsigned int mirror_last_pass_after_bytes_written = 100<<20;
* cause the I/O to freeze, however many bytes are left to copy. * cause the I/O to freeze, however many bytes are left to copy.
*/ */
static const int mirror_maximum_passes = 7; static const int mirror_maximum_passes = 7;
#define mirror_last_pass (mirror_maximum_passes - 1)
/* A single mirror pass over the disc, optionally locking IO around the
* transfer. /* This must not be called if there's any chance of further I/O. Methods to
* ensure this include:
* - Ensure image size is 0
* - call server_forbid_new_clients() followed by a successful
* server_close_clients() ; server_join_clients()
*/ */
int mirror_pass(struct server * serve, int should_lock, uint64_t *written)
{
uint64_t current = 0;
int success = 1;
struct bitset_mapping *map = serve->mirror->dirty_map;
*written = 0;
while (current < serve->size) {
int run = bitset_run_count(map, current, mirror_longest_write);
debug("mirror current=%ld, run=%d", current, run);
/* FIXME: we could avoid sending sparse areas of the
* disc here, and probably save a lot of bandwidth and
* time (if we know the destination starts off zeroed).
*/
if (bitset_is_set_at(map, current)) {
/* We've found a dirty area, send it */
debug("^^^ writing");
/* We need to stop the main thread from working
* because it might corrupt the dirty map. This
* is likely to slow things down but will be
* safe.
*/
if (should_lock) { server_lock_io( serve ); }
{
debug("in lock block");
/** FIXME: do something useful with bytes/second */
/** FIXME: error handling code here won't unlock */
socket_nbd_write( serve->mirror->client,
current,
run,
0,
serve->mirror->mapped + current,
MS_REQUEST_LIMIT_SECS);
/* now mark it clean */
bitset_clear_range(map, current, run);
debug("leaving lock block");
}
if (should_lock) { server_unlock_io( serve ); }
*written += run;
}
current += run;
if (serve->mirror->signal_abandon) {
debug("Abandon message received" );
success = 0;
break;
}
}
return success;
}
void mirror_give_control( struct mirror * mirror )
{
debug( "mirror: entrusting and disconnecting" );
/* TODO: set up an error handler to clean up properly on ERROR.
*/
/* A transfer of control is expressed as a 3-way handshake.
* First, We send a REQUEST_ENTRUST. If this fails to be
* received, this thread will simply block until the server is
* restarted. If the remote end doesn't understand it, it'll
* disconnect us, and an ERROR *should* bomb this thread.
* FIXME: make the ERROR work.
* If we get an explicit error back from the remote end, then
* again, this thread will bomb out.
* On receiving a valid response, we send a REQUEST_DISCONNECT,
* and we quit without checking for a response. This is the
* remote server's signal to assume control of the file. The
* reason we don't check for a response is the state we end up
* in if the final message goes astray: if we lose the
* REQUEST_DISCONNECT, the sender has quit and the receiver
* hasn't had a signal to take over yet, so the data is safe.
* If we were to wait for a response to the REQUEST_DISCONNECT,
* the sender and receiver would *both* be servicing write
* requests while the response was in flight, and if the
* response went astray we'd have two servers claiming
* responsibility for the same data.
*
* The meaning of these is as follows:
* The entrust signifies that all the data has been sent, and
* the client is currently paused but not disconnected.
* The disconnect signifies that the client has been
* safely prevented from making any more writes.
*
* Since we lock io and close the server it in mirror_on_exit before
* releasing, we don't actually need to take any action between the
* two here.
*/
socket_nbd_entrust( mirror->client );
socket_nbd_disconnect( mirror->client );
}
/* THIS FUNCTION MUST ONLY BE CALLED WITH THE SERVER'S IO LOCKED. */
void mirror_on_exit( struct server * serve ) void mirror_on_exit( struct server * serve )
{ {
/* Send an explicit entrust and disconnect. After this /* If we're still here, we can shut the server down.
* point we cannot allow any reads or writes to the local file. *
* We do this *before* trying to shut down the server so that if
* the transfer of control fails, we haven't stopped the server
* and already-connected clients don't get needlessly
* disconnected.
*/
debug( "mirror_give_control");
mirror_give_control( serve->mirror );
/* If we're still here, the transfer of control went ok, and the
* remote is listening (or will be shortly). We can shut the
* server down.
* *
* It doesn't matter if we get new client connections before
* now, the IO lock will stop them from doing anything.
*/ */
debug("serve_signal_close"); debug("serve_signal_close");
serve_signal_close( serve ); serve_signal_close( serve );
@@ -287,6 +249,14 @@ void mirror_on_exit( struct server * serve )
*/ */
debug("serve_wait_for_close"); debug("serve_wait_for_close");
serve_wait_for_close( serve ); serve_wait_for_close( serve );
if ( ACTION_UNLINK == serve->mirror->action_at_finish ) {
debug("Unlinking %s", serve->filename );
server_unlink( serve );
}
debug("Sending disconnect");
socket_nbd_disconnect( serve->mirror->client );
info("Mirror sent."); info("Mirror sent.");
} }
@@ -299,16 +269,18 @@ void mirror_cleanup( struct server * serve,
NULLCHECK( mirror ); NULLCHECK( mirror );
info( "Cleaning up mirror thread"); info( "Cleaning up mirror thread");
if ( mirror->mapped ) {
munmap( mirror->mapped, serve->size );
}
mirror->mapped = NULL;
if( mirror->client && mirror->client > 0 ){ if( mirror->client && mirror->client > 0 ){
close( mirror->client ); close( mirror->client );
} }
mirror->client = -1; mirror->client = -1;
if( server_io_locked( serve ) ){ server_unlock_io( serve ); }
} }
int mirror_connect( struct mirror * mirror, off64_t local_size ) int mirror_connect( struct mirror * mirror, off64_t local_size )
{ {
struct sockaddr * connect_from = NULL; struct sockaddr * connect_from = NULL;
@@ -364,39 +336,463 @@ int mirror_connect( struct mirror * mirror, off64_t local_size )
} }
int mirror_should_quit( struct mirror * mirror )
{
switch( mirror->action_at_finish ) {
case ACTION_EXIT:
case ACTION_UNLINK:
return 1;
default:
return 0;
}
}
/*
* If there's an event in the bitset stream of the serve allocation map, we
* use it to construct the next transfer request, covering precisely the area
* that has changed. If there are no events, we take the next
* TODO: should we detect short events and lengthen them to reduce overhead?
*
* iterates through the bitmap, finding a dirty run to form the basis of the
* next transfer, then puts it together. */
int mirror_setup_next_xfer( struct mirror_ctrl *ctrl )
{
struct mirror* mirror = ctrl->mirror;
struct server* serve = ctrl->serve;
struct bitset_stream_entry e = { .event = BITSET_STREAM_UNSET };
uint64_t current = mirror->offset, run = 0, size = serve->size;
/* Technically, we'd be interested in UNSET events too, but they are never
* generated. TODO if that changes.
*
* We use ctrl->clear_events to start emptying the stream when it's half
* full, and stop when it's a quarter full. This stops a busy client from
* stalling a migration forever. FIXME: made-up numbers.
*/
if ( bitset_stream_size( serve->allocation_map ) > BITSET_STREAM_SIZE / 2 ) {
ctrl->clear_events = 1;
}
while ( ( mirror->offset == serve->size || ctrl->clear_events ) && e.event != BITSET_STREAM_SET ) {
uint64_t events = bitset_stream_size( serve->allocation_map );
if ( events == 0 ) {
break;
}
debug("Dequeueing event");
bitset_stream_dequeue( ctrl->serve->allocation_map, &e );
debug("Dequeued event %i, %zu, %zu", e.event, e.from, e.len);
if ( events < ( BITSET_STREAM_SIZE / 4 ) ) {
ctrl->clear_events = 0;
}
}
if ( e.event == BITSET_STREAM_SET ) {
current = e.from;
run = e.len;
} else if ( current < serve->size ) {
current = mirror->offset;
run = mirror_longest_write;
/* Adjust final block if necessary */
if ( current + run > serve->size ) {
run = size - current;
}
mirror->offset += run;
} else {
return 0;
}
debug( "Next transfer: current=%"PRIu64", run=%"PRIu64, current, run );
struct nbd_request req = {
.magic = REQUEST_MAGIC,
.type = REQUEST_WRITE,
.handle = ".MIRROR.",
.from = current,
.len = run
};
nbd_h2r_request( &req, &ctrl->xfer.hdr.req_raw );
ctrl->xfer.from = current;
ctrl->xfer.len = run;
ctrl->xfer.written = 0;
ctrl->xfer.read = 0;
return 1;
}
uint64_t mirror_current_bps( struct mirror * mirror )
{
uint64_t duration_ms = monotonic_time_ms() - mirror->migration_started;
return mirror->all_dirty / ( ( duration_ms / 1000 ) + 1 );
}
int mirror_exceeds_max_bps( struct mirror * mirror )
{
uint64_t mig_speed = mirror_current_bps( mirror );
debug( "current_bps: %"PRIu64"; max_bps: %"PRIu64, mig_speed, mirror->max_bytes_per_second );
if ( mig_speed > mirror->max_bytes_per_second ) {
return 1;
}
return 0;
}
// ONLY CALL THIS AFTER CLOSING CLIENTS
void mirror_complete( struct server *serve )
{
/* FIXME: Pretty sure this is broken, if action != !QUIT. Just moving code
* around for now, can fix it later. Action is always quit in production */
if ( mirror_should_quit( serve->mirror ) ) {
debug("exit!");
/* FIXME: This depends on blocking I/O right now, so make sure we are */
sock_set_nonblock( serve->mirror->client, 0 );
mirror_on_exit( serve );
info("Server closed, quitting after successful migration");
}
mirror_set_state( serve->mirror, MS_DONE );
return;
}
static void mirror_write_cb( struct ev_loop *loop, ev_io *w, int revents )
{
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl );
struct xfer *xfer = &ctrl->xfer;
size_t to_write, hdr_size = sizeof( struct nbd_request_raw );
char *data_loc;
ssize_t count;
if ( !( revents & EV_WRITE ) ) {
warn( "No write event signalled in mirror write callback" );
return;
}
debug( "Mirror write callback invoked with events %d. fd: %i", revents, ctrl->mirror->client );
if ( xfer->written < hdr_size ) {
data_loc = ( (char*) &xfer->hdr.req_raw ) + ctrl->xfer.written;
to_write = hdr_size - xfer->written;
} else {
data_loc = ctrl->mirror->mapped + xfer->from + ( xfer->written - hdr_size );
to_write = xfer->len - ( ctrl->xfer.written - hdr_size );
}
// Actually read some bytes
if ( ( count = write( ctrl->mirror->client, data_loc, to_write ) ) < 0 ) {
if ( errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR ) {
warn( SHOW_ERRNO( "Couldn't write to listener" ) );
ev_break( loop, EVBREAK_ONE );
}
return;
}
debug( "Wrote %"PRIu64" bytes", count );
debug( "to_write was %"PRIu64", xfer->written was %"PRIu64, to_write, xfer->written );
ctrl->xfer.written += count;
// We wrote some bytes, so reset the timer
ev_timer_again( ctrl->ev_loop, &ctrl->timeout_watcher );
// All bytes written, so now we need to read the NBD reply back.
if ( ctrl->xfer.written == ctrl->xfer.len + hdr_size ) {
ev_io_start( loop, &ctrl->read_watcher );
ev_io_stop( loop, &ctrl->write_watcher );
}
return;
}
static void mirror_read_cb( struct ev_loop *loop, ev_io *w, int revents )
{
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl );
struct mirror *m = ctrl->mirror;
NULLCHECK( m );
struct xfer *xfer = &ctrl->xfer;
NULLCHECK( xfer );
if ( !( revents & EV_READ ) ) {
warn( "No read event signalled in mirror read callback" );
return;
}
struct nbd_reply rsp;
ssize_t count;
uint64_t left = sizeof( struct nbd_reply_raw ) - xfer->read;
debug( "Mirror read callback invoked with events %d. fd:%i", revents, m->client );
/* Start / continue reading the NBD response from the mirror. */
if ( ( count = read( m->client, ((void*) &xfer->hdr.rsp_raw) + xfer->read, left ) ) < 0 ) {
if ( errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR ) {
warn( SHOW_ERRNO( "Couldn't read from listener" ) );
ev_break( loop, EVBREAK_ONE );
}
debug( SHOW_ERRNO( "Couldn't read from listener (non-scary)" ) );
return;
}
if ( count == 0 ) {
warn( "EOF reading response from server!" );
ev_break( loop, EVBREAK_ONE );
return;
}
// We read some bytes, so reset the timer
ev_timer_again( ctrl->ev_loop, &ctrl->timeout_watcher );
debug( "Read %i bytes", count );
debug( "left was %"PRIu64", xfer->read was %"PRIu64, left, xfer->read );
xfer->read += count;
if ( xfer->read < sizeof( struct nbd_reply_raw ) ) {
// Haven't read the whole response yet
return;
}
nbd_r2h_reply( &xfer->hdr.rsp_raw, &rsp );
// validate reply, break event loop if bad
if ( rsp.magic != REPLY_MAGIC ) {
warn( "Bad reply magic from listener" );
ev_break( loop, EVBREAK_ONE );
return;
}
if ( rsp.error != 0 ) {
warn( "Error returned from listener: %i", rsp.error );
ev_break( loop, EVBREAK_ONE );
return;
}
if ( memcmp( ".MIRROR.", &rsp.handle[0], 8 ) != 0 ) {
warn( "Bad handle returned from listener" );
ev_break( loop, EVBREAK_ONE );
return;
}
/* transfer was completed, so now we need to either set up the next
* transfer of this pass, set up the first transfer of the next pass, or
* complete the migration */
m->all_dirty += xfer->len;
xfer->read = 0;
xfer->written = 0;
/* This next bit could take a little while, which is fine */
ev_timer_stop( ctrl->ev_loop, &ctrl->timeout_watcher );
/* Set up the next transfer, which may be offset + mirror_longest_write
* or an event from the bitset stream. When offset hits serve->size,
* xfers will be constructed solely from the event stream. Once our estimate
* of time left reaches a sensible number (or the event stream empties),
* we stop new clients from connecting, disconnect existing ones, then
* continue emptying the bitstream. Once it's empty again, we're finished.
*/
int next_xfer = mirror_setup_next_xfer( ctrl );
debug( "next_xfer: %d", next_xfer );
/* Regardless of time estimates, if there's no waiting transfer, we can
* */
if ( !ctrl->clients_closed && ( !next_xfer || server_mirror_eta( ctrl->serve ) < 60 ) ) {
info( "Closing clients to allow mirroring to converge" );
server_forbid_new_clients( ctrl->serve );
server_close_clients( ctrl->serve );
server_join_clients( ctrl->serve );
ctrl->clients_closed = 1;
/* One more try - a new event may have been pushed since our last check
*/
if ( !next_xfer ) {
next_xfer = mirror_setup_next_xfer( ctrl );
}
}
if ( ctrl->clients_closed && !next_xfer ) {
mirror_complete( ctrl->serve );
ev_break( loop, EVBREAK_ONE );
return;
}
/* This is a guard Just In Case */
ERROR_IF( !next_xfer, "Unknown problem - no next transfer to do!" );
ev_io_stop( loop, &ctrl->read_watcher );
/* FIXME: Should we ignore the bwlimit after server_close_clients has been called? */
if ( mirror_exceeds_max_bps( m ) ) {
/* We're over the bandwidth limit, so don't move onto the next transfer
* yet. Our limit_watcher will move us on once we're OK. timeout_watcher
* was disabled further up, so don't need to stop it here too */
debug( "max_bps exceeded, waiting" );
ev_timer_again( loop, &ctrl->limit_watcher );
} else {
/* We're waiting for the socket to become writable again, so re-enable */
ev_timer_again( loop, &ctrl->timeout_watcher );
ev_io_start( loop, &ctrl->write_watcher );
}
return;
}
void mirror_timeout_cb( struct ev_loop *loop, ev_timer *w __attribute__((unused)), int revents )
{
if ( !(revents & EV_TIMER ) ) {
warn( "Mirror timeout called but no timer event signalled" );
return;
}
info( "Mirror timeout signalled" );
ev_break( loop, EVBREAK_ONE );
return;
}
void mirror_abandon_cb( struct ev_loop *loop, ev_io *w, int revents )
{
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl );
if ( !(revents & EV_READ ) ) {
warn( "Mirror abandon called but no abandon event signalled" );
return;
}
debug( "Abandon message received" );
mirror_set_state( ctrl->mirror, MS_ABANDONED );
self_pipe_signal_clear( ctrl->mirror->abandon_signal );
ev_break( loop, EVBREAK_ONE );
return;
}
void mirror_limit_cb( struct ev_loop *loop, ev_timer *w, int revents )
{
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl );
if ( !(revents & EV_TIMER ) ) {
warn( "Mirror limit callback executed but no timer event signalled" );
return;
}
if ( mirror_exceeds_max_bps( ctrl->mirror ) ) {
debug( "max_bps exceeded, waiting", ctrl->mirror->max_bytes_per_second );
ev_timer_again( loop, w );
} else {
/* We're below the limit, so do the next request */
debug("max_bps not exceeded, performing next transfer" );
ev_io_start( loop, &ctrl->write_watcher );
ev_timer_stop( loop, &ctrl->limit_watcher );
ev_timer_again( loop, &ctrl->timeout_watcher );
}
return;
}
void mirror_run( struct server *serve ) void mirror_run( struct server *serve )
{ {
NULLCHECK( serve ); NULLCHECK( serve );
NULLCHECK( serve->mirror ); NULLCHECK( serve->mirror );
int pass; struct mirror *m = serve->mirror;
uint64_t written;
m->migration_started = monotonic_time_ms();
info("Starting mirror" ); info("Starting mirror" );
for (pass=0; pass < mirror_maximum_passes-1; pass++) {
debug("mirror start pass=%d", pass); /* mirror_setup_next_xfer won't be able to cope with this, so special-case
if ( !mirror_pass( serve, 1, &written ) ){ * it here. There can't be any writes going on, so don't bother locking
debug("Failed mirror pass state is %d", mirror_get_state( serve->mirror ) ); * anything.
debug("pass failed, giving up"); *
return; } */
if ( serve->size == 0 ) {
/* if we've not written anything */ info( "0-byte image special case" );
if (written < mirror_last_pass_after_bytes_written) { break; } mirror_complete( serve );
return;
} }
server_lock_io( serve ); struct mirror_ctrl ctrl;
{ memset( &ctrl, 0, sizeof( struct mirror_ctrl ) );
if ( mirror_pass( serve, 0, &written ) &&
ACTION_EXIT == serve->mirror->action_at_finish) { ctrl.serve = serve;
debug("exit!"); ctrl.mirror = m;
mirror_on_exit( serve );
info("Server closed, quitting " ctrl.ev_loop = EV_DEFAULT;
"after successful migration");
/* gcc warns on -O2. clang is fine. Seems to be the fault of ev.h */
ev_io_init( &ctrl.read_watcher, mirror_read_cb, m->client, EV_READ );
ctrl.read_watcher.data = (void*) &ctrl;
ev_io_init( &ctrl.write_watcher, mirror_write_cb, m->client, EV_WRITE );
ctrl.write_watcher.data = (void*) &ctrl;
ev_init( &ctrl.timeout_watcher, mirror_timeout_cb );
ctrl.timeout_watcher.repeat = MS_REQUEST_LIMIT_SECS_F ;
ev_init( &ctrl.limit_watcher, mirror_limit_cb );
ctrl.limit_watcher.repeat = 1.0; // We check bps every second. seems sane.
ctrl.limit_watcher.data = (void*) &ctrl;
ev_init( &ctrl.abandon_watcher, mirror_abandon_cb );
ev_io_set( &ctrl.abandon_watcher, m->abandon_signal->read_fd, EV_READ );
ctrl.abandon_watcher.data = (void*) &ctrl;
ev_io_start( ctrl.ev_loop, &ctrl.abandon_watcher );
ERROR_UNLESS(
mirror_setup_next_xfer( &ctrl ),
"Couldn't find first transfer for mirror!"
);
/* Start by writing xfer 0 to the listener */
ev_io_start( ctrl.ev_loop, &ctrl.write_watcher );
/* We want to timeout during the first write as well as subsequent ones */
ev_timer_again( ctrl.ev_loop, &ctrl.timeout_watcher );
/* Everything up to here is blocking. We switch to non-blocking so we
* can handle rate-limiting and weird error conditions better. TODO: We
* should expand the event loop upwards so we can do the same there too */
sock_set_nonblock( m->client, 1 );
bitset_enable_stream( serve->allocation_map );
info( "Entering event loop" );
ev_run( ctrl.ev_loop, 0 );
info( "Exited event loop" );
/* Parent code might expect a non-blocking socket */
sock_set_nonblock( m->client, 0 );
/* Errors in the event loop don't track I/O lock state or try to restore
* it to something sane - they just terminate the event loop with state !=
* MS_DONE. We re-allow new clients here if necessary.
*/
if ( m->action_at_finish == ACTION_NOTHING || m->commit_state != MS_DONE ) {
server_allow_new_clients( serve );
} }
/* Returning here says "mirroring complete" to the runner. The error
* call retries the migration from scratch. */
if ( m->commit_state != MS_DONE ) {
error( "Event loop exited, but mirroring is not complete" );
/* mirror_reset will be called before a retry, so keeping hold of events
* between now and our next mirroring attempt is not useful
*/
bitset_disable_stream( serve->allocation_map );
} }
server_unlock_io( serve );
return;
} }
@@ -439,7 +835,6 @@ void* mirror_runner(void* serve_params_uncast)
NULLCHECK( serve ); NULLCHECK( serve );
NULLCHECK( serve->mirror ); NULLCHECK( serve->mirror );
struct mirror * mirror = serve->mirror; struct mirror * mirror = serve->mirror;
NULLCHECK( mirror->dirty_map );
error_set_handler( (cleanup_handler *) mirror_cleanup, serve ); error_set_handler( (cleanup_handler *) mirror_cleanup, serve );
@@ -470,7 +865,15 @@ void* mirror_runner(void* serve_params_uncast)
mirror_run( serve ); mirror_run( serve );
mirror_set_state( mirror, MS_DONE ); /* On success, this is unnecessary, and harmless ( mirror_cleanup does it
* for us ). But if we've failed and are going to retry on the next run, we
* must close this socket here to have any chance of it succeeding.
*/
if ( !mirror->client < 0 ) {
sock_try_close( mirror->client );
mirror->client = -1;
}
abandon_mirror: abandon_mirror:
return NULL; return NULL;
} }
@@ -480,8 +883,8 @@ struct mirror_super * mirror_super_create(
const char * filename, const char * filename,
union mysockaddr * connect_to, union mysockaddr * connect_to,
union mysockaddr * connect_from, union mysockaddr * connect_from,
int max_Bps, uint64_t max_Bps,
int action_at_finish, enum mirror_finish_action action_at_finish,
struct mbox * state_mbox) struct mbox * state_mbox)
{ {
struct mirror_super * super = xmalloc( sizeof( struct mirror_super) ); struct mirror_super * super = xmalloc( sizeof( struct mirror_super) );
@@ -535,7 +938,7 @@ void * mirror_super_runner( void * serve_uncast )
int first_pass = 1; int first_pass = 1;
int should_retry = 0; int should_retry = 0;
int success = 0; int success = 0, abandoned = 0;
struct mirror * mirror = serve->mirror; struct mirror * mirror = serve->mirror;
struct mirror_super * super = serve->mirror_super; struct mirror_super * super = serve->mirror_super;
@@ -554,12 +957,13 @@ void * mirror_super_runner( void * serve_uncast )
debug( "Supervisor got commit signal" ); debug( "Supervisor got commit signal" );
if ( first_pass ) { if ( first_pass ) {
/* Only retry if the connection attempt was /* Only retry if the connection attempt was successful. Otherwise
* successful. Otherwise the user will see an * the user will see an error reported while we're still trying to
* error reported while we're still trying to * retry behind the scenes. This may race with migration completing
* retry behind the scenes. * but since we "shouldn't retry" in that case either, that's fine
*/ */
should_retry = *commit_state == MS_GO; should_retry = *commit_state == MS_GO;
/* Only send this signal the first time */ /* Only send this signal the first time */
mirror_super_signal_committed( mirror_super_signal_committed(
super, super,
@@ -574,18 +978,26 @@ void * mirror_super_runner( void * serve_uncast )
debug("Supervisor waiting for mirror thread" ); debug("Supervisor waiting for mirror thread" );
pthread_join( mirror->thread, NULL ); pthread_join( mirror->thread, NULL );
/* If we can't connect to the remote end, the watcher for the abandon
* signal never gets installed at the moment, which is why we also check
* it here. */
abandoned =
mirror_get_state( mirror ) == MS_ABANDONED ||
self_pipe_signal_clear( mirror->abandon_signal );
success = MS_DONE == mirror_get_state( mirror ); success = MS_DONE == mirror_get_state( mirror );
if( success ){ if( success ){
info( "Mirror supervisor success, exiting" ); } info( "Mirror supervisor success, exiting" );
else if ( mirror->signal_abandon ) { } else if ( abandoned ) {
info( "Mirror abandoned" ); info( "Mirror abandoned" );
should_retry = 0; should_retry = 0;
} } else if ( should_retry ) {
else if (should_retry){
info( "Mirror failed, retrying" ); info( "Mirror failed, retrying" );
} else {
info( "Mirror failed before commit, giving up" );
} }
else { info( "Mirror failed before commit, giving up" ); }
first_pass = 0; first_pass = 0;
@@ -602,13 +1014,6 @@ void * mirror_super_runner( void * serve_uncast )
} }
while ( should_retry && !success ); while ( should_retry && !success );
serve->mirror = NULL;
serve->mirror_super = NULL;
mirror_super_destroy( super );
debug( "Mirror supervisor done." );
return NULL; return NULL;
} }

View File

@@ -40,10 +40,11 @@ enum mirror_state;
* between the end of the written data and the start of the NBD reply. * between the end of the written data and the start of the NBD reply.
*/ */
#define MS_REQUEST_LIMIT_SECS 4 #define MS_REQUEST_LIMIT_SECS 4
#define MS_REQUEST_LIMIT_SECS_F 4.0
enum mirror_finish_action { enum mirror_finish_action {
ACTION_EXIT, ACTION_EXIT,
ACTION_UNLINK,
ACTION_NOTHING ACTION_NOTHING
}; };
@@ -51,6 +52,7 @@ enum mirror_state {
MS_UNKNOWN, MS_UNKNOWN,
MS_INIT, MS_INIT,
MS_GO, MS_GO,
MS_ABANDONED,
MS_DONE, MS_DONE,
MS_FAIL_CONNECT, MS_FAIL_CONNECT,
MS_FAIL_REJECTED, MS_FAIL_REJECTED,
@@ -60,17 +62,25 @@ enum mirror_state {
struct mirror { struct mirror {
pthread_t thread; pthread_t thread;
/* set to 1, then join thread to make mirror terminate early */
int signal_abandon; /* Signal to this then join the thread if you want to abandon mirroring */
struct self_pipe * abandon_signal;
union mysockaddr * connect_to; union mysockaddr * connect_to;
union mysockaddr * connect_from; union mysockaddr * connect_from;
int client; int client;
const char * filename; const char * filename;
off64_t max_bytes_per_second;
/* Limiter, used to restrict migration speed Only dirty bytes (those going
* over the network) are considered */
uint64_t max_bytes_per_second;
enum mirror_finish_action action_at_finish; enum mirror_finish_action action_at_finish;
char *mapped; char *mapped;
struct bitset_mapping *dirty_map;
/* We need to send every byte at least once; we do so by */
uint64_t offset;
enum mirror_state commit_state; enum mirror_state commit_state;
@@ -78,6 +88,13 @@ struct mirror {
* and checking the remote size, whether successful or not. * and checking the remote size, whether successful or not.
*/ */
struct mbox * commit_signal; struct mbox * commit_signal;
/* The time (from monotonic_time_ms()) the migration was started. Can be
* used to calculate bps, etc. */
uint64_t migration_started;
/* Running count of all bytes we've transferred */
uint64_t all_dirty;
}; };
@@ -99,9 +116,13 @@ struct mirror_super * mirror_super_create(
const char * filename, const char * filename,
union mysockaddr * connect_to, union mysockaddr * connect_to,
union mysockaddr * connect_from, union mysockaddr * connect_from,
int max_Bps, uint64_t max_Bps,
int action_at_finish, enum mirror_finish_action action_at_finish,
struct mbox * state_mbox struct mbox * state_mbox
); );
void * mirror_super_runner( void * serve_uncast ); void * mirror_super_runner( void * serve_uncast );
uint64_t mirror_current_bps( struct mirror * mirror );
#endif #endif

View File

@@ -15,10 +15,11 @@ static struct option serve_options[] = {
GETOPT_SOCK, GETOPT_SOCK,
GETOPT_DENY, GETOPT_DENY,
GETOPT_QUIET, GETOPT_QUIET,
GETOPT_KILLSWITCH,
GETOPT_VERBOSE, GETOPT_VERBOSE,
{0} {0}
}; };
static char serve_short_options[] = "hl:p:f:s:d" SOPT_QUIET SOPT_VERBOSE; static char serve_short_options[] = "hl:p:f:s:dk" SOPT_QUIET SOPT_VERBOSE;
static char serve_help_text[] = static char serve_help_text[] =
"Usage: flexnbd " CMD_SERVE " <options> [<acl address>*]\n\n" "Usage: flexnbd " CMD_SERVE " <options> [<acl address>*]\n\n"
"Serve FILE from ADDR:PORT, with an optional control socket at SOCK.\n\n" "Serve FILE from ADDR:PORT, with an optional control socket at SOCK.\n\n"
@@ -27,6 +28,7 @@ static char serve_help_text[] =
"\t--" OPT_PORT ",-p <PORT>\tThe port to serve on.\n" "\t--" OPT_PORT ",-p <PORT>\tThe port to serve on.\n"
"\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n" "\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n"
"\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n" "\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n"
"\t--" OPT_KILLSWITCH",-k \tKill the server if a request takes 120 seconds.\n"
SOCK_LINE SOCK_LINE
VERBOSE_LINE VERBOSE_LINE
QUIET_LINE; QUIET_LINE;
@@ -35,9 +37,7 @@ static char serve_help_text[] =
static struct option listen_options[] = { static struct option listen_options[] = {
GETOPT_HELP, GETOPT_HELP,
GETOPT_ADDR, GETOPT_ADDR,
GETOPT_REBIND_ADDR,
GETOPT_PORT, GETOPT_PORT,
GETOPT_REBIND_PORT,
GETOPT_FILE, GETOPT_FILE,
GETOPT_SOCK, GETOPT_SOCK,
GETOPT_DENY, GETOPT_DENY,
@@ -45,24 +45,19 @@ static struct option listen_options[] = {
GETOPT_VERBOSE, GETOPT_VERBOSE,
{0} {0}
}; };
static char listen_short_options[] = "hl:L:p:P:f:s:d" SOPT_QUIET SOPT_VERBOSE; static char listen_short_options[] = "hl:p:f:s:d" SOPT_QUIET SOPT_VERBOSE;
static char listen_help_text[] = static char listen_help_text[] =
"Usage: flexnbd " CMD_LISTEN " <options> [<acl_address>*]\n\n" "Usage: flexnbd " CMD_LISTEN " <options> [<acl_address>*]\n\n"
"Listen for an incoming migration on ADDR:PORT, " "Listen for an incoming migration on ADDR:PORT."
"then switch to REBIND_ADDR:REBIND_PORT on completion "
"to serve FILE.\n\n"
HELP_LINE HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to listen on.\n" "\t--" OPT_ADDR ",-l <ADDR>\tThe address to listen on.\n"
"\t--" OPT_REBIND_ADDR ",-L <REBIND_ADDR>\tThe address to switch to, if given.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to listen on.\n" "\t--" OPT_PORT ",-p <PORT>\tThe port to listen on.\n"
"\t--" OPT_REBIND_PORT ",-P <REBIND_PORT>\tThe port to switch to, if given..\n"
"\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n" "\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n"
"\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n" "\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n"
SOCK_LINE SOCK_LINE
VERBOSE_LINE VERBOSE_LINE
QUIET_LINE; QUIET_LINE;
static struct option read_options[] = { static struct option read_options[] = {
GETOPT_HELP, GETOPT_HELP,
GETOPT_ADDR, GETOPT_ADDR,
@@ -118,17 +113,36 @@ static char acl_help_text[] =
VERBOSE_LINE VERBOSE_LINE
QUIET_LINE; QUIET_LINE;
static struct option mirror_speed_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_MAX_SPEED,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char mirror_speed_short_options[] = "hs:m:" SOPT_QUIET SOPT_VERBOSE;
static char mirror_speed_help_text[] =
"Usage: flexnbd " CMD_MIRROR_SPEED " <options>\n\n"
"Set the maximum speed of a migration from a mirring server listening on SOCK.\n\n"
HELP_LINE
SOCK_LINE
MAX_SPEED_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option mirror_options[] = { static struct option mirror_options[] = {
GETOPT_HELP, GETOPT_HELP,
GETOPT_SOCK, GETOPT_SOCK,
GETOPT_ADDR, GETOPT_ADDR,
GETOPT_PORT, GETOPT_PORT,
GETOPT_UNLINK,
GETOPT_BIND, GETOPT_BIND,
GETOPT_QUIET, GETOPT_QUIET,
GETOPT_VERBOSE, GETOPT_VERBOSE,
{0} {0}
}; };
static char mirror_short_options[] = "hs:l:p:b:" SOPT_QUIET SOPT_VERBOSE; static char mirror_short_options[] = "hs:l:p:ub:" SOPT_QUIET SOPT_VERBOSE;
static char mirror_help_text[] = static char mirror_help_text[] =
"Usage: flexnbd " CMD_MIRROR " <options>\n\n" "Usage: flexnbd " CMD_MIRROR " <options>\n\n"
"Start mirroring from the server with control socket SOCK to one at ADDR:PORT.\n\n" "Start mirroring from the server with control socket SOCK to one at ADDR:PORT.\n\n"
@@ -136,10 +150,27 @@ static char mirror_help_text[] =
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to mirror to.\n" "\t--" OPT_ADDR ",-l <ADDR>\tThe address to mirror to.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to mirror to.\n" "\t--" OPT_PORT ",-p <PORT>\tThe port to mirror to.\n"
SOCK_LINE SOCK_LINE
"\t--" OPT_UNLINK ",-u\tUnlink the local file when done.\n"
BIND_LINE BIND_LINE
VERBOSE_LINE VERBOSE_LINE
QUIET_LINE; QUIET_LINE;
static struct option break_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char break_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char break_help_text[] =
"Usage: flexnbd " CMD_BREAK " <options>\n\n"
"Stop mirroring from the server with control socket SOCK.\n\n"
HELP_LINE
SOCK_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option status_options[] = { static struct option status_options[] = {
GETOPT_HELP, GETOPT_HELP,
@@ -161,10 +192,13 @@ char help_help_text_arr[] =
"Usage: flexnbd <cmd> [cmd options]\n\n" "Usage: flexnbd <cmd> [cmd options]\n\n"
"Commands:\n" "Commands:\n"
"\tflexnbd serve\n" "\tflexnbd serve\n"
"\tflexnbd listen\n"
"\tflexnbd read\n" "\tflexnbd read\n"
"\tflexnbd write\n" "\tflexnbd write\n"
"\tflexnbd acl\n" "\tflexnbd acl\n"
"\tflexnbd mirror\n" "\tflexnbd mirror\n"
"\tflexnbd mirror-speed\n"
"\tflexnbd break\n"
"\tflexnbd status\n" "\tflexnbd status\n"
"\tflexnbd help\n\n" "\tflexnbd help\n\n"
"See flexnbd help <cmd> for further info\n"; "See flexnbd help <cmd> for further info\n";
@@ -175,13 +209,12 @@ char * help_help_text = help_help_text_arr;
int do_serve(struct server* params);
void do_read(struct mode_readwrite_params* params); void do_read(struct mode_readwrite_params* params);
void do_write(struct mode_readwrite_params* params); void do_write(struct mode_readwrite_params* params);
void do_remote_command(char* command, char* mode, int argc, char** argv); void do_remote_command(char* command, char* mode, int argc, char** argv);
void read_serve_param( int c, char **ip_addr, char **ip_port, char **file, char **sock, int *default_deny ) void read_serve_param( int c, char **ip_addr, char **ip_port, char **file, char **sock, int *default_deny, int *use_killswitch )
{ {
switch(c){ switch(c){
case 'h': case 'h':
@@ -204,11 +237,14 @@ void read_serve_param( int c, char **ip_addr, char **ip_port, char **file, char
*default_deny = 1; *default_deny = 1;
break; break;
case 'q': case 'q':
log_level = 4; log_level = QUIET_LOG_LEVEL;
break; break;
case 'v': case 'v':
log_level = VERBOSE_LOG_LEVEL; log_level = VERBOSE_LOG_LEVEL;
break; break;
case 'k':
*use_killswitch = 1;
break;
default: default:
exit_err( serve_help_text ); exit_err( serve_help_text );
break; break;
@@ -218,9 +254,7 @@ void read_serve_param( int c, char **ip_addr, char **ip_port, char **file, char
void read_listen_param( int c, void read_listen_param( int c,
char **ip_addr, char **ip_addr,
char **rebind_ip_addr,
char **ip_port, char **ip_port,
char **rebind_ip_port,
char **file, char **file,
char **sock, char **sock,
int *default_deny ) int *default_deny )
@@ -233,15 +267,9 @@ void read_listen_param( int c,
case 'l': case 'l':
*ip_addr = optarg; *ip_addr = optarg;
break; break;
case 'L':
*rebind_ip_addr = optarg;
break;
case 'p': case 'p':
*ip_port = optarg; *ip_port = optarg;
break; break;
case 'P':
*rebind_ip_port = optarg;
break;
case 'f': case 'f':
*file = optarg; *file = optarg;
break; break;
@@ -252,7 +280,7 @@ void read_listen_param( int c,
*default_deny = 1; *default_deny = 1;
break; break;
case 'q': case 'q':
log_level = 4; log_level = QUIET_LOG_LEVEL;
break; break;
case 'v': case 'v':
log_level = VERBOSE_LOG_LEVEL; log_level = VERBOSE_LOG_LEVEL;
@@ -263,11 +291,11 @@ void read_listen_param( int c,
} }
} }
void read_readwrite_param( int c, char **ip_addr, char **ip_port, char **bind_addr, char **from, char **size) void read_readwrite_param( int c, char **ip_addr, char **ip_port, char **bind_addr, char **from, char **size, char *err_text )
{ {
switch(c){ switch(c){
case 'h': case 'h':
fprintf(stdout, "%s\n", read_help_text ); fprintf(stdout, "%s\n", err_text );
exit( 0 ); exit( 0 );
break; break;
case 'l': case 'l':
@@ -286,13 +314,13 @@ void read_readwrite_param( int c, char **ip_addr, char **ip_port, char **bind_ad
*bind_addr = optarg; *bind_addr = optarg;
break; break;
case 'q': case 'q':
log_level = 4; log_level = QUIET_LOG_LEVEL;
break; break;
case 'v': case 'v':
log_level = VERBOSE_LOG_LEVEL; log_level = VERBOSE_LOG_LEVEL;
break; break;
default: default:
exit_err( read_help_text ); exit_err( err_text );
break; break;
} }
} }
@@ -308,7 +336,7 @@ void read_sock_param( int c, char **sock, char *help_text )
*sock = optarg; *sock = optarg;
break; break;
case 'q': case 'q':
log_level = 4; log_level = QUIET_LOG_LEVEL;
break; break;
case 'v': case 'v':
log_level = VERBOSE_LOG_LEVEL; log_level = VERBOSE_LOG_LEVEL;
@@ -324,7 +352,43 @@ void read_acl_param( int c, char **sock )
read_sock_param( c, sock, acl_help_text ); read_sock_param( c, sock, acl_help_text );
} }
void read_mirror_param( int c, char **sock, char **ip_addr, char **ip_port, char **bind_addr ) void read_mirror_speed_param(
int c,
char **sock,
char **max_speed
)
{
switch( c ) {
case 'h':
fprintf( stdout, "%s\n", mirror_speed_help_text );
exit( 0 );
break;
case 's':
*sock = optarg;
break;
case 'm':
*max_speed = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( mirror_speed_help_text );
break;
}
}
void read_mirror_param(
int c,
char **sock,
char **ip_addr,
char **ip_port,
int *unlink,
char **bind_addr )
{ {
switch( c ){ switch( c ){
case 'h': case 'h':
@@ -340,11 +404,14 @@ void read_mirror_param( int c, char **sock, char **ip_addr, char **ip_port, char
case 'p': case 'p':
*ip_port = optarg; *ip_port = optarg;
break; break;
case 'u':
*unlink = 1;
break;
case 'b': case 'b':
*bind_addr = optarg; *bind_addr = optarg;
break; break;
case 'q': case 'q':
log_level = 4; log_level = QUIET_LOG_LEVEL;
break; break;
case 'v': case 'v':
log_level = VERBOSE_LOG_LEVEL; log_level = VERBOSE_LOG_LEVEL;
@@ -355,6 +422,29 @@ void read_mirror_param( int c, char **sock, char **ip_addr, char **ip_port, char
} }
} }
void read_break_param( int c, char **sock )
{
switch( c ) {
case 'h':
fprintf( stdout, "%s\n", break_help_text );
exit( 0 );
break;
case 's':
*sock = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( break_help_text );
break;
}
}
void read_status_param( int c, char **sock ) void read_status_param( int c, char **sock )
{ {
read_sock_param( c, sock, status_help_text ); read_sock_param( c, sock, status_help_text );
@@ -368,15 +458,18 @@ int mode_serve( int argc, char *argv[] )
char *file = NULL; char *file = NULL;
char *sock = NULL; char *sock = NULL;
int default_deny = 0; // not on by default int default_deny = 0; // not on by default
int use_killswitch = 0;
int err = 0; int err = 0;
int success;
struct flexnbd * flexnbd; struct flexnbd * flexnbd;
while (1) { while (1) {
c = getopt_long(argc, argv, serve_short_options, serve_options, NULL); c = getopt_long(argc, argv, serve_short_options, serve_options, NULL);
if ( c == -1 ) { break; } if ( c == -1 ) { break; }
read_serve_param( c, &ip_addr, &ip_port, &file, &sock, &default_deny ); read_serve_param( c, &ip_addr, &ip_port, &file, &sock, &default_deny, &use_killswitch );
} }
if ( NULL == ip_addr || NULL == ip_port ) { if ( NULL == ip_addr || NULL == ip_port ) {
@@ -389,11 +482,12 @@ int mode_serve( int argc, char *argv[] )
} }
if ( err ) { exit_err( serve_help_text ); } if ( err ) { exit_err( serve_help_text ); }
flexnbd = flexnbd_create_serving( ip_addr, ip_port, file, sock, default_deny, argc - optind, argv + optind, MAX_NBD_CLIENTS ); flexnbd = flexnbd_create_serving( ip_addr, ip_port, file, sock, default_deny, argc - optind, argv + optind, MAX_NBD_CLIENTS, use_killswitch );
flexnbd_serve( flexnbd ); info( "Serving file %s", file );
success = flexnbd_serve( flexnbd );
flexnbd_destroy( flexnbd ); flexnbd_destroy( flexnbd );
return 0; return success ? 0 : 1;
} }
@@ -401,9 +495,7 @@ int mode_listen( int argc, char *argv[] )
{ {
int c; int c;
char *ip_addr = NULL; char *ip_addr = NULL;
char *rebind_ip_addr = NULL;
char *ip_port = NULL; char *ip_port = NULL;
char *rebind_ip_port = NULL;
char *file = NULL; char *file = NULL;
char *sock = NULL; char *sock = NULL;
int default_deny = 0; // not on by default int default_deny = 0; // not on by default
@@ -417,7 +509,7 @@ int mode_listen( int argc, char *argv[] )
c = getopt_long(argc, argv, listen_short_options, listen_options, NULL); c = getopt_long(argc, argv, listen_short_options, listen_options, NULL);
if ( c == -1 ) { break; } if ( c == -1 ) { break; }
read_listen_param( c, &ip_addr, &rebind_ip_addr, &ip_port, &rebind_ip_port, read_listen_param( c, &ip_addr, &ip_port,
&file, &sock, &default_deny ); &file, &sock, &default_deny );
} }
@@ -433,15 +525,12 @@ int mode_listen( int argc, char *argv[] )
flexnbd = flexnbd_create_listening( flexnbd = flexnbd_create_listening(
ip_addr, ip_addr,
rebind_ip_addr,
ip_port, ip_port,
rebind_ip_port,
file, file,
sock, sock,
default_deny, default_deny,
argc - optind, argc - optind,
argv + optind, argv + optind);
MAX_NBD_CLIENTS );
success = flexnbd_serve( flexnbd ); success = flexnbd_serve( flexnbd );
flexnbd_destroy( flexnbd ); flexnbd_destroy( flexnbd );
@@ -536,7 +625,7 @@ int mode_read( int argc, char *argv[] )
if ( c == -1 ) { break; } if ( c == -1 ) { break; }
read_readwrite_param( c, &ip_addr, &ip_port, &bind_addr, &from, &size ); read_readwrite_param( c, &ip_addr, &ip_port, &bind_addr, &from, &size, read_help_text );
} }
if ( NULL == ip_addr || NULL == ip_port ) { if ( NULL == ip_addr || NULL == ip_port ) {
@@ -571,7 +660,7 @@ int mode_write( int argc, char *argv[] )
c = getopt_long(argc, argv, write_short_options, write_options, NULL); c = getopt_long(argc, argv, write_short_options, write_options, NULL);
if ( c == -1 ) { break; } if ( c == -1 ) { break; }
read_readwrite_param( c, &ip_addr, &ip_port, &bind_addr, &from, &size ); read_readwrite_param( c, &ip_addr, &ip_port, &bind_addr, &from, &size, write_help_text );
} }
if ( NULL == ip_addr || NULL == ip_port ) { if ( NULL == ip_addr || NULL == ip_port ) {
@@ -615,17 +704,52 @@ int mode_acl( int argc, char *argv[] )
} }
int mode_mirror_speed( int argc, char *argv[] )
{
int c;
char *sock = NULL;
char *speed = NULL;
while( 1 ) {
c = getopt_long( argc, argv, mirror_speed_short_options, mirror_speed_options, NULL );
if ( -1 == c ) { break; }
read_mirror_speed_param( c, &sock, &speed );
}
if ( NULL == sock ) {
fprintf( stderr, "--sock is required.\n" );
exit_err( mirror_speed_help_text );
}
if ( NULL == speed ) {
fprintf( stderr, "--max-speed is required.\n");
exit_err( mirror_speed_help_text );
}
do_remote_command( "mirror_max_bps", sock, 1, &speed );
return 0;
}
int mode_mirror( int argc, char *argv[] ) int mode_mirror( int argc, char *argv[] )
{ {
int c; int c;
char *sock = NULL; char *sock = NULL;
char *remote_argv[4] = {0}; char *remote_argv[4] = {0};
int err = 0; int err = 0;
int unlink = 0;
remote_argv[2] = "exit";
while (1) { while (1) {
c = getopt_long( argc, argv, mirror_short_options, mirror_options, NULL); c = getopt_long( argc, argv, mirror_short_options, mirror_options, NULL);
if ( -1 == c ) { break; } if ( -1 == c ) { break; }
read_mirror_param( c, &sock, &remote_argv[0], &remote_argv[1], &remote_argv[2] ); read_mirror_param( c,
&sock,
&remote_argv[0],
&remote_argv[1],
&unlink,
&remote_argv[3] );
} }
if ( NULL == sock ){ if ( NULL == sock ){
@@ -637,18 +761,40 @@ int mode_mirror( int argc, char *argv[] )
err = 1; err = 1;
} }
if ( err ) { exit_err( mirror_help_text ); } if ( err ) { exit_err( mirror_help_text ); }
if ( unlink ) { remote_argv[2] = "unlink"; }
if (remote_argv[2] == NULL) { if (remote_argv[3] == NULL) {
do_remote_command( "mirror", sock, 2, remote_argv ); do_remote_command( "mirror", sock, 3, remote_argv );
} }
else { else {
do_remote_command( "mirror", sock, 3, remote_argv ); do_remote_command( "mirror", sock, 4, remote_argv );
} }
return 0; return 0;
} }
int mode_break( int argc, char *argv[] )
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long( argc, argv, break_short_options, break_options, NULL );
if ( -1 == c ) { break; }
read_break_param( c, &sock );
}
if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" );
exit_err( acl_help_text );
}
do_remote_command( "break", sock, argc - optind, argv + optind );
return 0;
}
int mode_status( int argc, char *argv[] ) int mode_status( int argc, char *argv[] )
{ {
int c; int c;
@@ -670,7 +816,6 @@ int mode_status( int argc, char *argv[] )
return 0; return 0;
} }
int mode_help( int argc, char *argv[] ) int mode_help( int argc, char *argv[] )
{ {
char *cmd; char *cmd;
@@ -718,10 +863,15 @@ void mode(char* mode, int argc, char **argv)
} }
else if ( IS_CMD( CMD_ACL, mode ) ) { else if ( IS_CMD( CMD_ACL, mode ) ) {
mode_acl( argc, argv ); mode_acl( argc, argv );
} else if ( IS_CMD ( CMD_MIRROR_SPEED, mode ) ) {
mode_mirror_speed( argc, argv );
} }
else if ( IS_CMD( CMD_MIRROR, mode ) ) { else if ( IS_CMD( CMD_MIRROR, mode ) ) {
mode_mirror( argc, argv ); mode_mirror( argc, argv );
} }
else if ( IS_CMD( CMD_BREAK, mode ) ) {
mode_break( argc, argv );
}
else if ( IS_CMD( CMD_STATUS, mode ) ) { else if ( IS_CMD( CMD_STATUS, mode ) ) {
mode_status( argc, argv ); mode_status( argc, argv );
} }
@@ -735,4 +885,3 @@ void mode(char* mode, int argc, char **argv)
exit(0); exit(0);
} }

View File

@@ -12,15 +12,18 @@ void mode(char* mode, int argc, char **argv);
#define OPT_HELP "help" #define OPT_HELP "help"
#define OPT_ADDR "addr" #define OPT_ADDR "addr"
#define OPT_REBIND_ADDR "rebind-addr"
#define OPT_BIND "bind" #define OPT_BIND "bind"
#define OPT_PORT "port" #define OPT_PORT "port"
#define OPT_REBIND_PORT "rebind-port"
#define OPT_FILE "file" #define OPT_FILE "file"
#define OPT_SOCK "sock" #define OPT_SOCK "sock"
#define OPT_FROM "from" #define OPT_FROM "from"
#define OPT_SIZE "size" #define OPT_SIZE "size"
#define OPT_DENY "default-deny" #define OPT_DENY "default-deny"
#define OPT_UNLINK "unlink"
#define OPT_CONNECT_ADDR "conn-addr"
#define OPT_CONNECT_PORT "conn-port"
#define OPT_KILLSWITCH "killswitch"
#define OPT_MAX_SPEED "max-speed"
#define CMD_SERVE "serve" #define CMD_SERVE "serve"
#define CMD_LISTEN "listen" #define CMD_LISTEN "listen"
@@ -28,9 +31,11 @@ void mode(char* mode, int argc, char **argv);
#define CMD_WRITE "write" #define CMD_WRITE "write"
#define CMD_ACL "acl" #define CMD_ACL "acl"
#define CMD_MIRROR "mirror" #define CMD_MIRROR "mirror"
#define CMD_MIRROR_SPEED "mirror-speed"
#define CMD_BREAK "break"
#define CMD_STATUS "status" #define CMD_STATUS "status"
#define CMD_HELP "help" #define CMD_HELP "help"
#define LEN_CMD_MAX 7 #define LEN_CMD_MAX 13
#define PATH_LEN_MAX 1024 #define PATH_LEN_MAX 1024
#define ADDR_LEN_MAX 64 #define ADDR_LEN_MAX 64
@@ -40,16 +45,18 @@ void mode(char* mode, int argc, char **argv);
#define GETOPT_HELP GETOPT_FLAG( OPT_HELP, 'h' ) #define GETOPT_HELP GETOPT_FLAG( OPT_HELP, 'h' )
#define GETOPT_DENY GETOPT_FLAG( OPT_DENY, 'd' ) #define GETOPT_DENY GETOPT_FLAG( OPT_DENY, 'd' )
#define GETOPT_ADDR GETOPT_ARG( OPT_ADDR, 'l' ) #define GETOPT_ADDR GETOPT_ARG( OPT_ADDR, 'l' )
#define GETOPT_REBIND_ADDR GETOPT_ARG( OPT_REBIND_ADDR, 'L')
#define GETOPT_PORT GETOPT_ARG( OPT_PORT, 'p' ) #define GETOPT_PORT GETOPT_ARG( OPT_PORT, 'p' )
#define GETOPT_REBIND_PORT GETOPT_ARG( OPT_REBIND_PORT, 'P')
#define GETOPT_FILE GETOPT_ARG( OPT_FILE, 'f' ) #define GETOPT_FILE GETOPT_ARG( OPT_FILE, 'f' )
#define GETOPT_SOCK GETOPT_ARG( OPT_SOCK, 's' ) #define GETOPT_SOCK GETOPT_ARG( OPT_SOCK, 's' )
#define GETOPT_FROM GETOPT_ARG( OPT_FROM, 'F' ) #define GETOPT_FROM GETOPT_ARG( OPT_FROM, 'F' )
#define GETOPT_SIZE GETOPT_ARG( OPT_SIZE, 'S' ) #define GETOPT_SIZE GETOPT_ARG( OPT_SIZE, 'S' )
#define GETOPT_BIND GETOPT_ARG( OPT_BIND, 'b' ) #define GETOPT_BIND GETOPT_ARG( OPT_BIND, 'b' )
#define GETOPT_UNLINK GETOPT_ARG( OPT_UNLINK, 'u' )
#define GETOPT_CONNECT_ADDR GETOPT_ARG( OPT_CONNECT_ADDR, 'C' )
#define GETOPT_CONNECT_PORT GETOPT_ARG( OPT_CONNECT_PORT, 'P' )
#define GETOPT_KILLSWITCH GETOPT_ARG( OPT_KILLSWITCH, 'k' )
#define GETOPT_MAX_SPEED GETOPT_ARG( OPT_MAX_SPEED, 'm' )
#define OPT_VERBOSE "verbose" #define OPT_VERBOSE "verbose"
#define SOPT_VERBOSE "v" #define SOPT_VERBOSE "v"
@@ -63,6 +70,8 @@ void mode(char* mode, int argc, char **argv);
# define VERBOSE_LOG_LEVEL 1 # define VERBOSE_LOG_LEVEL 1
#endif #endif
#define QUIET_LOG_LEVEL 4
#define OPT_QUIET "quiet" #define OPT_QUIET "quiet"
#define SOPT_QUIET "q" #define SOPT_QUIET "q"
#define GETOPT_QUIET GETOPT_FLAG( OPT_QUIET, 'q' ) #define GETOPT_QUIET GETOPT_FLAG( OPT_QUIET, 'q' )
@@ -76,8 +85,10 @@ void mode(char* mode, int argc, char **argv);
"\t--" OPT_SOCK ",-s <SOCK>\tPath to the control socket.\n" "\t--" OPT_SOCK ",-s <SOCK>\tPath to the control socket.\n"
#define BIND_LINE \ #define BIND_LINE \
"\t--" OPT_BIND ",-b <BIND-ADDR>\tBind the local socket to a particular IP address.\n" "\t--" OPT_BIND ",-b <BIND-ADDR>\tBind the local socket to a particular IP address.\n"
#define MAX_SPEED_LINE \
"\t--" OPT_MAX_SPEED ",-m <bps>\tMaximum speed of the migration, in bytes/sec.\n"
char * help_help_text; char * help_help_text;
#endif #endif

View File

@@ -55,3 +55,4 @@ void nbd_h2r_reply( struct nbd_reply * from, struct nbd_reply_raw * to )
to->error = be32toh( from->error ); to->error = be32toh( from->error );
memcpy( to->handle, from->handle, 8 ); memcpy( to->handle, from->handle, 8 );
} }

View File

@@ -10,7 +10,16 @@
#define REQUEST_READ 0 #define REQUEST_READ 0
#define REQUEST_WRITE 1 #define REQUEST_WRITE 1
#define REQUEST_DISCONNECT 2 #define REQUEST_DISCONNECT 2
#define REQUEST_ENTRUST (1<<16)
/* The top 2 bytes of the type field are overloaded and can contain flags */
#define REQUEST_MASK 0x0000ffff
/* 1MiB is the de-facto standard for maximum size of header + data */
#define NBD_MAX_SIZE ( 1024 * 1024 )
#define NBD_REQUEST_SIZE ( sizeof( struct nbd_request_raw ) )
#define NBD_REPLY_SIZE ( sizeof( struct nbd_reply_raw ) )
#include <linux/types.h> #include <linux/types.h>
#include <inttypes.h> #include <inttypes.h>
@@ -52,7 +61,7 @@ struct nbd_init {
struct nbd_request { struct nbd_request {
uint32_t magic; uint32_t magic;
uint32_t type; /* == READ || == WRITE */ uint32_t type; /* == READ || == WRITE || == DISCONNECT */
char handle[8]; char handle[8];
uint64_t from; uint64_t from;
uint32_t len; uint32_t len;
@@ -64,7 +73,6 @@ struct nbd_reply {
char handle[8]; /* handle you got from request */ char handle[8]; /* handle you got from request */
}; };
void nbd_r2h_init( struct nbd_init_raw * from, struct nbd_init * to ); void nbd_r2h_init( struct nbd_init_raw * from, struct nbd_init * to );
void nbd_r2h_request( struct nbd_request_raw *from, struct nbd_request * to ); void nbd_r2h_request( struct nbd_request_raw *from, struct nbd_request * to );
void nbd_r2h_reply( struct nbd_reply_raw * from, struct nbd_reply * to ); void nbd_r2h_reply( struct nbd_reply_raw * from, struct nbd_reply * to );

View File

@@ -8,6 +8,7 @@ int atoi(const char *nptr);
((x) >= 'A' && (x) <= 'F' ) || \ ((x) >= 'A' && (x) <= 'F' ) || \
(x) == ':' || (x) == '.' \ (x) == ':' || (x) == '.' \
) )
/* FIXME: should change this to return negative on error like everything else */ /* FIXME: should change this to return negative on error like everything else */
int parse_ip_to_sockaddr(struct sockaddr* out, char* src) int parse_ip_to_sockaddr(struct sockaddr* out, char* src)
{ {
@@ -47,6 +48,22 @@ int parse_ip_to_sockaddr(struct sockaddr* out, char* src)
return 0; return 0;
} }
int parse_to_sockaddr(struct sockaddr* out, char* address)
{
struct sockaddr_un* un = (struct sockaddr_un*) out;
NULLCHECK( address );
if ( address[0] == '/' ) {
un->sun_family = AF_UNIX;
strncpy( un->sun_path, address, 108 ); /* FIXME: linux only */
return 1;
}
return parse_ip_to_sockaddr( out, address );
}
int parse_acl(struct ip_and_mask (**out)[], int max, char **entries) int parse_acl(struct ip_and_mask (**out)[], int max, char **entries)
{ {
struct ip_and_mask* list; struct ip_and_mask* list;

View File

@@ -2,6 +2,8 @@
#define PARSE_H #define PARSE_H
#include <sys/socket.h> #include <sys/socket.h>
#include <sys/un.h>
#include <arpa/inet.h> #include <arpa/inet.h>
#include <unistd.h> #include <unistd.h>
@@ -10,6 +12,7 @@ union mysockaddr {
struct sockaddr generic; struct sockaddr generic;
struct sockaddr_in v4; struct sockaddr_in v4;
struct sockaddr_in6 v6; struct sockaddr_in6 v6;
struct sockaddr_un un;
}; };
struct ip_and_mask { struct ip_and_mask {
@@ -18,6 +21,7 @@ struct ip_and_mask {
}; };
int parse_ip_to_sockaddr(struct sockaddr* out, char* src); int parse_ip_to_sockaddr(struct sockaddr* out, char* src);
int parse_to_sockaddr(struct sockaddr* out, char* src);
int parse_acl(struct ip_and_mask (**out)[], int max, char **entries); int parse_acl(struct ip_and_mask (**out)[], int max, char **entries);
void parse_port( char *s_port, struct sockaddr_in *out ); void parse_port( char *s_port, struct sockaddr_in *out );

14
src/prefetch.h Normal file
View File

@@ -0,0 +1,14 @@
#ifndef PREFETCH_H
#define PREFETCH_H
#define PREFETCH_BUFSIZE 4096
struct prefetch {
int is_full;
__be64 from;
__be32 len;
char buffer[PREFETCH_BUFSIZE];
};
#endif

157
src/proxy-main.c Normal file
View File

@@ -0,0 +1,157 @@
#include <signal.h>
#include "mode.h"
#include "util.h"
#include "proxy.h"
static struct option proxy_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_CONNECT_ADDR,
GETOPT_CONNECT_PORT,
GETOPT_BIND,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char proxy_short_options[] = "hl:p:C:P:b:" SOPT_QUIET SOPT_VERBOSE;
static char proxy_help_text[] =
"Usage: flexnbd-proxy <options>\n\n"
"Resiliently proxy an NBD connection between client and server\n"
"We can listen on TCP or UNIX socket, but only connect to TCP servers.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address we will bind to as a proxy.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port we will bind to as a proxy, if required.\n"
"\t--" OPT_CONNECT_ADDR ",-C <ADDR>\tAddress of the proxied server.\n"
"\t--" OPT_CONNECT_PORT ",-P <PORT>\tPort of the proxied server.\n"
"\t--" OPT_BIND ",-b <ADDR>\tThe address we connect from, as a proxy.\n"
QUIET_LINE
VERBOSE_LINE;
void read_proxy_param(
int c,
char **downstream_addr,
char **downstream_port,
char **upstream_addr,
char **upstream_port,
char **bind_addr )
{
switch( c ) {
case 'h' :
fprintf( stdout, "%s\n", proxy_help_text );
exit( 0 );
break;
case 'l':
*downstream_addr = optarg;
break;
case 'p':
*downstream_port = optarg;
break;
case 'C':
*upstream_addr = optarg;
break;
case 'P':
*upstream_port = optarg;
break;
case 'b':
*bind_addr = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( proxy_help_text );
break;
}
}
struct proxier * proxy = NULL;
void my_exit(int signum)
{
info( "Exit signalled (%i)", signum );
if ( NULL != proxy ) {
proxy_cleanup( proxy );
};
exit( 0 );
}
int main( int argc, char *argv[] )
{
int c;
char *downstream_addr = NULL;
char *downstream_port = NULL;
char *upstream_addr = NULL;
char *upstream_port = NULL;
char *bind_addr = NULL;
int success;
sigset_t mask;
struct sigaction exit_action;
sigemptyset( &mask );
sigaddset( &mask, SIGTERM );
sigaddset( &mask, SIGQUIT );
sigaddset( &mask, SIGINT );
exit_action.sa_handler = my_exit;
exit_action.sa_mask = mask;
exit_action.sa_flags = 0;
while (1) {
c = getopt_long( argc, argv, proxy_short_options, proxy_options, NULL );
if ( -1 == c ) { break; }
read_proxy_param( c,
&downstream_addr,
&downstream_port,
&upstream_addr,
&upstream_port,
&bind_addr
);
}
if ( NULL == downstream_addr ){
fprintf( stderr, "--addr is required.\n" );
exit_err( proxy_help_text );
} else if ( NULL == upstream_addr || NULL == upstream_port ){
fprintf( stderr, "both --conn-addr and --conn-port are required.\n" );
exit_err( proxy_help_text );
}
proxy = proxy_create(
downstream_addr,
downstream_port,
upstream_addr,
upstream_port,
bind_addr
);
/* Set these *after* proxy has been assigned to */
sigaction(SIGTERM, &exit_action, NULL);
sigaction(SIGQUIT, &exit_action, NULL);
sigaction(SIGINT, &exit_action, NULL);
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */
if ( NULL != downstream_port ) {
info(
"Proxying between %s %s (downstream) and %s %s (upstream)",
downstream_addr, downstream_port, upstream_addr, upstream_port
);
} else {
info(
"Proxying between %s (downstream) and %s %s (upstream)",
downstream_addr, upstream_addr, upstream_port
);
}
success = do_proxy( proxy );
proxy_destroy( proxy );
return success ? 0 : 1;
}

933
src/proxy.c Normal file
View File

@@ -0,0 +1,933 @@
#include "proxy.h"
#include "readwrite.h"
#ifdef PREFETCH
#include "prefetch.h"
#endif
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include <errno.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
struct proxier* proxy_create(
char* s_downstream_address,
char* s_downstream_port,
char* s_upstream_address,
char* s_upstream_port,
char* s_upstream_bind )
{
struct proxier* out;
out = xmalloc( sizeof( struct proxier ) );
FATAL_IF_NULL(s_downstream_address, "Listen address not specified");
NULLCHECK( s_downstream_address );
FATAL_UNLESS(
parse_to_sockaddr( &out->listen_on.generic, s_downstream_address ),
"Couldn't parse downstream address %s"
);
if ( out->listen_on.family != AF_UNIX ) {
FATAL_IF_NULL( s_downstream_port, "Downstream port not specified" );
NULLCHECK( s_downstream_port );
parse_port( s_downstream_port, &out->listen_on.v4 );
}
FATAL_IF_NULL(s_upstream_address, "Upstream address not specified");
NULLCHECK( s_upstream_address );
FATAL_UNLESS(
parse_ip_to_sockaddr( &out->connect_to.generic, s_upstream_address ),
"Couldn't parse upstream address '%s'",
s_upstream_address
);
FATAL_IF_NULL( s_upstream_port, "Upstream port not specified" );
NULLCHECK( s_upstream_port );
parse_port( s_upstream_port, &out->connect_to.v4 );
if ( s_upstream_bind ) {
FATAL_IF_ZERO(
parse_ip_to_sockaddr( &out->connect_from.generic, s_upstream_bind ),
"Couldn't parse bind address '%s'",
s_upstream_bind
);
out->bind = 1;
}
out->listen_fd = -1;
out->downstream_fd = -1;
out->upstream_fd = -1;
#ifdef PREFETCH
out->prefetch = xmalloc( sizeof( struct prefetch ) );
#endif
out->init.buf = xmalloc( sizeof( struct nbd_init_raw ) );
out->req.buf = xmalloc( NBD_MAX_SIZE );
out->rsp.buf = xmalloc( NBD_MAX_SIZE );
return out;
}
void proxy_destroy( struct proxier* proxy )
{
free( proxy->init.buf );
free( proxy->req.buf );
free( proxy->rsp.buf );
#ifdef PREFETCH
free( proxy->prefetch );
#endif
free( proxy );
}
/* Shared between our two different connect_to_upstream paths */
void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size );
/* Try to establish a connection to our upstream server. Return 1 on success,
* 0 on failure. this is a blocking call that returns a non-blocking socket.
*/
int proxy_connect_to_upstream( struct proxier* proxy )
{
struct sockaddr* connect_from = NULL;
if ( proxy->bind ) {
connect_from = &proxy->connect_from.generic;
}
int fd = socket_connect( &proxy->connect_to.generic, connect_from );
off64_t size = 0;
if ( -1 == fd ) {
return 0;
}
if( !socket_nbd_read_hello( fd, &size ) ) {
WARN_IF_NEGATIVE(
sock_try_close( fd ),
"Couldn't close() after failed read of NBD hello on fd %i", fd
);
return 0;
}
proxy->upstream_fd = fd;
sock_set_nonblock( fd, 1 );
proxy_finish_connect_to_upstream( proxy, size );
return 1;
}
/* First half of non-blocking connection to upstream. Gets as far as calling
* connect() on a non-blocking socket.
*/
void proxy_start_connect_to_upstream( struct proxier* proxy )
{
int fd, result;
struct sockaddr* from = NULL;
struct sockaddr* to = &proxy->connect_to.generic;
if ( proxy->bind ) {
from = &proxy->connect_from.generic;
}
fd = socket( to->sa_family , SOCK_STREAM, 0 );
if( fd < 0 ) {
warn( SHOW_ERRNO( "Couldn't create socket to reconnect to upstream" ) );
return;
}
info( "Beginning non-blocking connection to upstream on fd %i", fd );
if ( NULL != from ) {
if ( 0 > bind( fd, from, sockaddr_size( from ) ) ) {
warn( SHOW_ERRNO( "bind() to source address failed" ) );
}
}
result = sock_set_nonblock( fd, 1 );
if ( result == -1 ) {
warn( SHOW_ERRNO( "Failed to set upstream fd %i non-blocking", fd ) );
goto error;
}
result = connect( fd, to, sockaddr_size( to ) );
if ( result == -1 && errno != EINPROGRESS ) {
warn( SHOW_ERRNO( "Failed to start connect()ing to upstream!" ) );
goto error;
}
proxy->upstream_fd = fd;
return;
error:
if ( sock_try_close( fd ) == -1 ) {
/* Non-fatal leak, although still nasty */
warn( SHOW_ERRNO( "Failed to close fd for upstream %i", fd ) );
}
return;
}
void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size ) {
if ( proxy->upstream_size == 0 ) {
info( "Size of upstream image is %"PRIu64" bytes", size );
} else if ( proxy->upstream_size != size ) {
warn(
"Size changed from %"PRIu64" to %"PRIu64" bytes",
proxy->upstream_size, size
);
}
proxy->upstream_size = size;
info( "Connected to upstream on fd %i", proxy->upstream_fd );
return;
}
void proxy_disconnect_from_upstream( struct proxier* proxy )
{
if ( -1 != proxy->upstream_fd ) {
info("Closing upstream connection on fd %i", proxy->upstream_fd );
/* TODO: An NBD disconnect would be pleasant here */
WARN_IF_NEGATIVE(
sock_try_close( proxy->upstream_fd ),
"Failed to close() fd %i when disconnecting from upstream",
proxy->upstream_fd
);
proxy->upstream_fd = -1;
}
}
/** Prepares a listening socket for the NBD server, binding etc. */
void proxy_open_listen_socket(struct proxier* params)
{
NULLCHECK( params );
params->listen_fd = socket(params->listen_on.family, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(
params->listen_fd, SHOW_ERRNO( "Couldn't create listen socket" )
);
/* Allow us to restart quickly */
FATAL_IF_NEGATIVE(
sock_set_reuseaddr(params->listen_fd, 1),
SHOW_ERRNO( "Couldn't set SO_REUSEADDR" )
);
if( AF_UNIX != params->listen_on.family ) {
FATAL_IF_NEGATIVE(
sock_set_tcp_nodelay(params->listen_fd, 1),
SHOW_ERRNO( "Couldn't set TCP_NODELAY" )
);
}
FATAL_UNLESS_ZERO(
sock_try_bind( params->listen_fd, &params->listen_on.generic ),
SHOW_ERRNO( "Failed to bind to listening socket" )
);
/* We're only serving one client at a time, hence backlog of 1 */
FATAL_IF_NEGATIVE(
listen(params->listen_fd, 1),
SHOW_ERRNO( "Failed to listen on listening socket" )
);
info( "Now listening for incoming connections" );
return;
}
typedef enum {
EXIT,
WRITE_TO_DOWNSTREAM,
READ_FROM_DOWNSTREAM,
CONNECT_TO_UPSTREAM,
READ_INIT_FROM_UPSTREAM,
WRITE_TO_UPSTREAM,
READ_FROM_UPSTREAM
} proxy_session_states;
static char* proxy_session_state_names[] = {
"EXIT",
"WRITE_TO_DOWNSTREAM",
"READ_FROM_DOWNSTREAM",
"CONNECT_TO_UPSTREAM",
"READ_INIT_FROM_UPSTREAM",
"WRITE_TO_UPSTREAM",
"READ_FROM_UPSTREAM"
};
static inline int proxy_state_upstream( int state )
{
return state == CONNECT_TO_UPSTREAM || state == READ_INIT_FROM_UPSTREAM ||
state == WRITE_TO_UPSTREAM || state == READ_FROM_UPSTREAM;
}
#ifdef PREFETCH
int proxy_prefetch_for_request( struct proxier* proxy, int state )
{
struct nbd_request* req = &proxy->req_hdr;
struct nbd_reply* rsp = &proxy->rsp_hdr;
struct nbd_request_raw* req_raw = (struct nbd_request_raw*) proxy->req.buf;
struct nbd_reply_raw *rsp_raw = (struct nbd_reply_raw*) proxy->rsp.buf;
int is_read = ( req->type & REQUEST_MASK ) == REQUEST_READ;
int prefetch_start = req->from;
int prefetch_end = req->from + ( req->len * 2 );
/* We only want to consider prefetching if we know we're not
* getting too much data back, if it's a read request, and if
* the prefetch won't try to read past the end of the file.
*/
int prefetching = req->len <= PREFETCH_BUFSIZE && is_read &&
prefetch_start < prefetch_end && prefetch_end <= proxy->upstream_size;
if ( is_read ) {
/* See if we can respond with what's in our prefetch
* cache */
if ( proxy->prefetch->is_full &&
req->from == proxy->prefetch->from &&
req->len == proxy->prefetch->len ) {
/* HUZZAH! A match! */
debug( "Prefetch hit!" );
/* First build a reply header */
rsp->magic = REPLY_MAGIC;
rsp->error = 0;
memcpy( &rsp->handle, &req->handle, 8 );
/* now copy it into the response */
nbd_h2r_reply( rsp, rsp_raw );
/* and the data */
memcpy(
proxy->rsp.buf + NBD_REPLY_SIZE,
proxy->prefetch->buffer, proxy->prefetch->len
);
proxy->rsp.size = NBD_REPLY_SIZE + proxy->prefetch->len;
proxy->rsp.needle = 0;
/* return early, our work here is done */
return WRITE_TO_DOWNSTREAM;
}
}
else {
/* Safety catch. If we're sending a write request, we
* blow away the cache. This is very pessimistic, but
* it's simpler (and therefore safer) than working out
* whether we can keep it or not.
*/
debug( "Blowing away prefetch cache on type %d request.", req->type );
proxy->prefetch->is_full = 0;
}
debug( "Prefetch cache MISS!");
/* We pull the request out of the proxy struct, rewrite the
* request size, and write it back.
*/
if ( prefetching ) {
proxy->is_prefetch_req = 1;
proxy->prefetch_req_orig_len = req->len;
req->len *= 2;
debug( "Prefetching %"PRIu32" bytes", req->len - proxy->prefetch_req_orig_len );
nbd_h2r_request( req, req_raw );
}
return state;
}
int proxy_prefetch_for_reply( struct proxier* proxy, int state )
{
size_t prefetched_bytes;
if ( !proxy->is_prefetch_req ) {
return state;
}
prefetched_bytes = proxy->req_hdr.len - proxy->prefetch_req_orig_len;
debug( "Prefetched %d bytes", prefetched_bytes );
memcpy(
proxy->rsp.buf + proxy->prefetch_req_orig_len,
&(proxy->prefetch->buffer),
prefetched_bytes
);
proxy->prefetch->from = proxy->req_hdr.from + proxy->prefetch_req_orig_len;
proxy->prefetch->len = prefetched_bytes;
/* We've finished with proxy->req by now, so don't need to alter it to make
* it look like the request was before prefetch */
/* Truncate the bytes we'll write downstream */
proxy->req_hdr.len = proxy->prefetch_req_orig_len;
proxy->rsp.size -= prefetched_bytes;
/* And we need to reset these */
proxy->prefetch->is_full = 1;
proxy->is_prefetch_req = 0;
return state;
}
#endif
int proxy_read_from_downstream( struct proxier *proxy, int state )
{
ssize_t count;
struct nbd_request_raw* request_raw = (struct nbd_request_raw*) proxy->req.buf;
struct nbd_request* request = &(proxy->req_hdr);
// assert( state == READ_FROM_DOWNSTREAM );
count = iobuf_read( proxy->downstream_fd, &proxy->req, NBD_REQUEST_SIZE );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Couldn't read request from downstream" ) );
return EXIT;
}
if ( proxy->req.needle == NBD_REQUEST_SIZE ) {
nbd_r2h_request( request_raw, request );
if ( ( request->type & REQUEST_MASK ) == REQUEST_DISCONNECT ) {
info( "Received disconnect request from client" );
return EXIT;
}
/* Simple validations */
if ( ( request->type & REQUEST_MASK ) == REQUEST_READ ) {
if (request->len > ( NBD_MAX_SIZE - NBD_REPLY_SIZE ) ) {
warn( "NBD read request size %"PRIu32" too large", request->len );
return EXIT;
}
}
if ( (request->type & REQUEST_MASK ) == REQUEST_WRITE ) {
if (request->len > ( NBD_MAX_SIZE - NBD_REQUEST_SIZE ) ) {
warn( "NBD write request size %"PRIu32" too large", request->len );
return EXIT;
}
proxy->req.size += request->len;
}
}
if ( proxy->req.needle == proxy->req.size ) {
debug(
"Received NBD request from downstream. type=%"PRIu32" from=%"PRIu64" len=%"PRIu32,
request->type, request->from, request->len
);
/* Finished reading, so advance state. Leave size untouched so the next
* state knows how many bytes to write */
proxy->req.needle = 0;
return WRITE_TO_UPSTREAM;
}
return state;
}
int proxy_continue_connecting_to_upstream( struct proxier* proxy, int state )
{
int error, result;
socklen_t len = sizeof( error );
// assert( state == CONNECT_TO_UPSTREAM );
result = getsockopt(
proxy->upstream_fd, SOL_SOCKET, SO_ERROR, &error, &len
);
if ( result == -1 ) {
warn( SHOW_ERRNO( "Failed to tell if connected to upstream" ) );
return state;
}
if ( error != 0 ) {
errno = error;
warn( SHOW_ERRNO( "Failed to connect to upstream" ) );
return state;
}
#ifdef PREFETCH
/* Data may have changed while we were disconnected */
proxy->prefetch->is_full = 0;
#endif
info( "Connected to upstream on fd %i", proxy->upstream_fd );
return READ_INIT_FROM_UPSTREAM;
}
int proxy_read_init_from_upstream( struct proxier* proxy, int state )
{
ssize_t count;
// assert( state == READ_INIT_FROM_UPSTREAM );
count = iobuf_read( proxy->upstream_fd, &proxy->init, sizeof( struct nbd_init_raw ) );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to read init from upstream" ) );
goto disconnect;
}
if ( proxy->init.needle == proxy->init.size ) {
off64_t upstream_size;
if ( !nbd_check_hello( (struct nbd_init_raw*) proxy->init.buf, &upstream_size ) ) {
warn( "Upstream sent invalid init" );
goto disconnect;
}
/* Currently, we only get disconnected from upstream (so needing to come
* here) when we have an outstanding request. If that becomes false,
* we'll need to choose the right state to return to here */
proxy->init.needle = 0;
return WRITE_TO_UPSTREAM;
}
return state;
disconnect:
proxy->init.needle = 0;
proxy->init.size = 0;
return CONNECT_TO_UPSTREAM;
}
int proxy_write_to_upstream( struct proxier* proxy, int state )
{
ssize_t count;
// assert( state == WRITE_TO_UPSTREAM );
count = iobuf_write( proxy->upstream_fd, &proxy->req );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to send request to upstream" ) );
proxy->req.needle = 0;
return CONNECT_TO_UPSTREAM;
}
if ( proxy->req.needle == proxy->req.size ) {
/* Request sent. Advance to reading the response from upstream. We might
* still need req.size if reading the reply fails - we disconnect
* and resend the reply in that case - so keep it around for now. */
proxy->req.needle = 0;
return READ_FROM_UPSTREAM;
}
return state;
}
int proxy_read_from_upstream( struct proxier* proxy, int state )
{
ssize_t count;
struct nbd_reply* reply = &(proxy->rsp_hdr);
struct nbd_reply_raw* reply_raw = (struct nbd_reply_raw*) proxy->rsp.buf;
/* We can't assume the NBD_REPLY_SIZE + req->len is what we'll get back */
count = iobuf_read( proxy->upstream_fd, &proxy->rsp, NBD_REPLY_SIZE );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to get reply from upstream" ) );
goto disconnect;
}
if ( proxy->rsp.needle == NBD_REPLY_SIZE ) {
nbd_r2h_reply( reply_raw, reply );
if ( reply->magic != REPLY_MAGIC ) {
warn( "Reply magic is incorrect" );
goto disconnect;
}
if ( reply->error != 0 ) {
warn( "NBD error returned from upstream: %"PRIu32, reply->error );
goto disconnect;
}
if ( ( proxy->req_hdr.type & REQUEST_MASK ) == REQUEST_READ ) {
/* Get the read reply data too. */
proxy->rsp.size += proxy->req_hdr.len;
}
}
if ( proxy->rsp.size == proxy->rsp.needle ) {
debug( "NBD reply received from upstream." );
proxy->rsp.needle = 0;
return WRITE_TO_DOWNSTREAM;
}
return state;
disconnect:
proxy->rsp.needle = 0;
proxy->rsp.size = 0;
return CONNECT_TO_UPSTREAM;
}
int proxy_write_to_downstream( struct proxier* proxy, int state )
{
ssize_t count;
// assert( state == WRITE_TO_DOWNSTREAM );
if ( !proxy->hello_sent ) {
info( "Writing init to downstream" );
}
count = iobuf_write( proxy->downstream_fd, &proxy->rsp );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to write to downstream" ) );
return EXIT;
}
if ( proxy->rsp.needle == proxy->rsp.size ) {
if ( !proxy->hello_sent ) {
info( "Hello message sent to client" );
proxy->hello_sent = 1;
} else {
debug( "Reply sent" );
proxy->req_count++;
}
/* We're done with the request & response buffers now */
proxy->req.size = 0;
proxy->req.needle = 0;
proxy->rsp.size = 0;
proxy->rsp.needle = 0;
return READ_FROM_DOWNSTREAM;
}
return state;
}
/* Non-blocking proxy session. Simple(ish) state machine. We read from d/s until
* we have a full request, then try to write that request u/s. If writing fails,
* we reconnect to upstream and retry. Once we've successfully written, we
* attempt to read the reply. If that fails or times out (we give it 30 seconds)
* then we disconnect from u/s and go back to trying to reconnect and resend.
*
* This is the second-simplest NBD proxy I can think of. The first version was
* non-blocking I/O, but it was getting impossible to manage exceptional stuff
*/
void proxy_session( struct proxier* proxy )
{
uint64_t state_started = monotonic_time_ms();
int old_state = EXIT;
int state;
int connect_to_upstream_cooldown = 0;
/* First action: Write hello to downstream */
nbd_hello_to_buf( (struct nbd_init_raw *) proxy->rsp.buf, proxy->upstream_size );
proxy->rsp.size = sizeof( struct nbd_init_raw );
proxy->rsp.needle = 0;
state = WRITE_TO_DOWNSTREAM;
info( "Beginning proxy session on fd %i", proxy->downstream_fd );
while( state != EXIT ) {
struct timeval select_timeout = {
.tv_sec = 0,
.tv_usec = 0
};
struct timeval *select_timeout_ptr = NULL;
int result; /* used by select() */
fd_set rfds;
fd_set wfds;
FD_ZERO( &rfds );
FD_ZERO( &wfds );
if ( state != old_state ) {
state_started = monotonic_time_ms();
debug(
"State transitition from %s to %s",
proxy_session_state_names[old_state],
proxy_session_state_names[state]
);
} else {
debug( "Proxy is in state %s", proxy_session_state_names[state], state );
}
old_state = state;
switch( state ) {
case READ_FROM_DOWNSTREAM:
FD_SET( proxy->downstream_fd, &rfds );
break;
case WRITE_TO_DOWNSTREAM:
FD_SET( proxy->downstream_fd, &wfds );
break;
case WRITE_TO_UPSTREAM:
select_timeout.tv_sec = 15;
FD_SET(proxy->upstream_fd, &wfds );
break;
case CONNECT_TO_UPSTREAM:
/* upstream_fd is now -1 */
proxy_disconnect_from_upstream( proxy );
if ( connect_to_upstream_cooldown ) {
connect_to_upstream_cooldown = 0;
select_timeout.tv_sec = 3;
} else {
proxy_start_connect_to_upstream( proxy );
if ( proxy->upstream_fd == -1 ) {
warn( SHOW_ERRNO( "Error acquiring socket to upstream" ) );
continue;
}
FD_SET( proxy->upstream_fd, &wfds );
select_timeout.tv_sec = 15;
}
break;
case READ_INIT_FROM_UPSTREAM:
case READ_FROM_UPSTREAM:
select_timeout.tv_sec = 15;
FD_SET( proxy->upstream_fd, &rfds );
break;
};
if ( select_timeout.tv_sec > 0 ) {
select_timeout_ptr = &select_timeout;
}
result = sock_try_select( FD_SETSIZE, &rfds, &wfds, NULL, select_timeout_ptr );
if ( result == -1 ) {
warn( SHOW_ERRNO( "select() failed: " ) );
break;
}
/* Happens after failed reconnect. Avoid SIGBUS on FD_ISSET() */
if ( proxy->upstream_fd == -1 ) {
continue;
}
switch( state ) {
case READ_FROM_DOWNSTREAM:
if ( FD_ISSET( proxy->downstream_fd, &rfds ) ) {
state = proxy_read_from_downstream( proxy, state );
#ifdef PREFETCH
/* Check if we can fulfil the request from prefetch, or
* rewrite the request to fill the prefetch buffer if needed
*/
if ( state == WRITE_TO_UPSTREAM ) {
state = proxy_prefetch_for_request( proxy, state );
}
#endif
}
break;
case CONNECT_TO_UPSTREAM:
if ( FD_ISSET( proxy->upstream_fd, &wfds ) ) {
state = proxy_continue_connecting_to_upstream( proxy, state );
}
/* Leaving state untouched will retry connecting to upstream -
* so introduce a bit of sleep */
if ( state == CONNECT_TO_UPSTREAM ) {
connect_to_upstream_cooldown = 1;
}
break;
case READ_INIT_FROM_UPSTREAM:
state = proxy_read_init_from_upstream( proxy, state );
if ( state == CONNECT_TO_UPSTREAM ) {
connect_to_upstream_cooldown = 1;
}
break;
case WRITE_TO_UPSTREAM:
if ( FD_ISSET( proxy->upstream_fd, &wfds ) ) {
state = proxy_write_to_upstream( proxy, state );
}
break;
case READ_FROM_UPSTREAM:
if ( FD_ISSET( proxy->upstream_fd, &rfds ) ) {
state = proxy_read_from_upstream( proxy, state );
}
# ifdef PREFETCH
/* Fill the prefetch buffer and rewrite the reply, if needed */
if ( state == WRITE_TO_DOWNSTREAM ) {
state = proxy_prefetch_for_reply( proxy, state );
}
#endif
break;
case WRITE_TO_DOWNSTREAM:
if ( FD_ISSET( proxy->downstream_fd, &wfds ) ) {
state = proxy_write_to_downstream( proxy, state );
}
break;
}
/* In these states, we're interested in restarting after a timeout.
*/
if ( old_state == state && proxy_state_upstream( state ) ) {
if ( ( monotonic_time_ms() ) - state_started > UPSTREAM_TIMEOUT ) {
warn(
"Timed out in state %s while communicating with upstream",
proxy_session_state_names[state]
);
state = CONNECT_TO_UPSTREAM;
}
}
}
info(
"Finished proxy session on fd %i after %"PRIu64" successful request(s)",
proxy->downstream_fd, proxy->req_count
);
/* Reset these two for the next session */
proxy->req_count = 0;
proxy->hello_sent = 0;
return;
}
/** Accept an NBD socket connection, dispatch appropriately */
int proxy_accept( struct proxier* params )
{
NULLCHECK( params );
int client_fd;
fd_set fds;
union mysockaddr client_address;
socklen_t socklen = sizeof( client_address );
info( "Waiting for client connection" );
FD_ZERO(&fds);
FD_SET(params->listen_fd, &fds);
FATAL_IF_NEGATIVE(
sock_try_select(FD_SETSIZE, &fds, NULL, NULL, NULL),
SHOW_ERRNO( "select() failed" )
);
if ( FD_ISSET( params->listen_fd, &fds ) ) {
client_fd = accept( params->listen_fd, &client_address.generic, &socklen );
if ( client_address.family != AF_UNIX ) {
if ( sock_set_tcp_nodelay(client_fd, 1) == -1 ) {
warn( SHOW_ERRNO( "Failed to set TCP_NODELAY" ) );
}
}
info( "Accepted nbd client socket fd %d", client_fd );
sock_set_nonblock( client_fd, 1 );
params->downstream_fd = client_fd;
proxy_session( params );
WARN_IF_NEGATIVE(
sock_try_close( params->downstream_fd ),
"Couldn't close() downstram fd %i after proxy session",
params->downstream_fd
);
params->downstream_fd = -1;
}
return 1; /* We actually expect to be interrupted by signal handlers */
}
void proxy_accept_loop( struct proxier* params )
{
NULLCHECK( params );
while( proxy_accept( params ) );
}
/** Closes sockets */
void proxy_cleanup( struct proxier* proxy )
{
NULLCHECK( proxy );
info( "Cleaning up" );
if ( -1 != proxy->listen_fd ) {
if ( AF_UNIX == proxy->listen_on.family ) {
if ( -1 == unlink( proxy->listen_on.un.sun_path ) ) {
warn( SHOW_ERRNO( "Failed to unlink %s", proxy->listen_on.un.sun_path ) );
}
}
WARN_IF_NEGATIVE(
sock_try_close( proxy->listen_fd ),
SHOW_ERRNO( "Failed to close() listen fd %i", proxy->listen_fd )
);
proxy->listen_fd = -1;
}
if ( -1 != proxy->downstream_fd ) {
WARN_IF_NEGATIVE(
sock_try_close( proxy->downstream_fd ),
SHOW_ERRNO(
"Failed to close() downstream fd %i", proxy->downstream_fd
)
);
proxy->downstream_fd = -1;
}
if ( -1 != proxy->upstream_fd ) {
WARN_IF_NEGATIVE(
sock_try_close( proxy->upstream_fd ),
SHOW_ERRNO(
"Failed to close() upstream fd %i", proxy->upstream_fd
)
);
proxy->upstream_fd = -1;
}
info( "Cleanup done" );
}
/** Full lifecycle of the proxier */
int do_proxy( struct proxier* params )
{
NULLCHECK( params );
info( "Ensuring upstream server is open" );
if ( !proxy_connect_to_upstream( params ) ) {
warn( "Couldn't connect to upstream server during initialization, exiting" );
proxy_cleanup( params );
return 1;
};
proxy_open_listen_socket( params );
proxy_accept_loop( params );
proxy_cleanup( params );
return 0;
}

98
src/proxy.h Normal file
View File

@@ -0,0 +1,98 @@
#ifndef PROXY_H
#define PROXY_H
#include <sys/types.h>
#include <unistd.h>
#include "ioutil.h"
#include "flexnbd.h"
#include "parse.h"
#include "nbdtypes.h"
#include "self_pipe.h"
#ifdef PREFETCH
#include "prefetch.h"
#endif
/** UPSTREAM_TIMEOUT
* How long ( in ms ) to allow for upstream to respond. If it takes longer
* than this, we will cancel the current request-response to them and resubmit
*/
#define UPSTREAM_TIMEOUT 30 * 1000
struct proxier {
/* The flexnbd wrapper this proxier is attached to */
struct flexnbd* flexnbd;
/** address/port to bind to */
union mysockaddr listen_on;
/** address/port to connect to */
union mysockaddr connect_to;
/** address to bind to when making outgoing connections */
union mysockaddr connect_from;
int bind; /* Set to true if we should use it */
/* The socket we listen() on and accept() against */
int listen_fd;
/* The socket returned by accept() that we receive requests from and send
* responses to
*/
int downstream_fd;
/* The socket returned by connect() that we send requests to and receive
* responses from
*/
int upstream_fd;
/* This is the size we advertise to the downstream server */
off64_t upstream_size;
/* We transform the raw request header into here */
struct nbd_request req_hdr;
/* We transform the raw reply header into here */
struct nbd_reply rsp_hdr;
/* Used for our non-blocking negotiation with upstream. TODO: maybe use
* for downstream as well ( we currently overload rsp ) */
struct iobuf init;
/* The current NBD request from downstream */
struct iobuf req;
/* The current NBD reply from upstream */
struct iobuf rsp;
/* It's starting to feel like we need an object for a single proxy session.
* These two track how many requests we've sent so far, and whether the
* NBD_INIT code has been sent to the client yet.
*/
uint64_t req_count;
int hello_sent;
#ifdef PREFETCH
/* While the in-flight request has been munged by prefetch, these two are
* set to true, and the original length of the request, respectively */
int is_prefetch_req;
uint32_t prefetch_req_orig_len;
/* And here, we actually store the prefetched data once it's returned */
struct prefetch *prefetch;
#endif
};
struct proxier* proxy_create(
char* s_downstream_address,
char* s_downstream_port,
char* s_upstream_address,
char* s_upstream_port,
char* s_upstream_bind );
int do_proxy( struct proxier* proxy );
void proxy_cleanup( struct proxier* proxy );
void proxy_destroy( struct proxier* proxy );
#endif

View File

@@ -1,5 +1,6 @@
#include "nbdtypes.h" #include "nbdtypes.h"
#include "ioutil.h" #include "ioutil.h"
#include "sockutil.h"
#include "util.h" #include "util.h"
#include "serve.h" #include "serve.h"
@@ -17,43 +18,87 @@ int socket_connect(struct sockaddr* to, struct sockaddr* from)
if (NULL != from) { if (NULL != from) {
if ( 0 > bind( fd, from, sizeof(struct sockaddr_in6 ) ) ){ if ( 0 > bind( fd, from, sizeof(struct sockaddr_in6 ) ) ){
warn( "bind() failed"); warn( SHOW_ERRNO( "bind() to source address failed" ) );
close( fd ); if ( 0 > close( fd ) ) { /* Non-fatal leak */
warn( SHOW_ERRNO( "Failed to close fd %i", fd ) );
}
return -1; return -1;
} }
} }
if ( 0 > connect(fd, to, sizeof(struct sockaddr_in6)) ) { if ( 0 > sock_try_connect( fd, to, sizeof( struct sockaddr_in6 ), 15 ) ) {
warn( "connect failed" ); warn( SHOW_ERRNO( "connect failed" ) );
close( fd ); if ( 0 > close( fd ) ) { /* Non-fatal leak */
warn( SHOW_ERRNO( "Failed to close fd %i", fd ) );
}
return -1; return -1;
} }
if ( sock_set_tcp_nodelay( fd, 1 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to set TCP_NODELAY" ) );
}
return fd; return fd;
} }
int socket_nbd_read_hello(int fd, off64_t * out_size) int nbd_check_hello( struct nbd_init_raw* init_raw, off64_t* out_size )
{ {
struct nbd_init init; if ( strncmp( init_raw->passwd, INIT_PASSWD, 8 ) != 0 ) {
if ( 0 > readloop(fd, &init, sizeof(init)) ) {
warn( "Couldn't read init" );
goto fail;
}
if (strncmp(init.passwd, INIT_PASSWD, 8) != 0) {
warn( "wrong passwd" ); warn( "wrong passwd" );
goto fail; goto fail;
} }
if (be64toh(init.magic) != INIT_MAGIC) { if ( be64toh( init_raw->magic ) != INIT_MAGIC ) {
warn("wrong magic (%x)", be64toh(init.magic)); warn( "wrong magic (%x)", be64toh( init_raw->magic ) );
goto fail; goto fail;
} }
if ( NULL != out_size ) { if ( NULL != out_size ) {
*out_size = be64toh(init.size); *out_size = be64toh( init_raw->size );
} }
return 1; return 1;
fail: fail:
return 0; return 0;
}
int socket_nbd_read_hello( int fd, off64_t* out_size )
{
struct nbd_init_raw init_raw;
if ( 0 > readloop( fd, &init_raw, sizeof(init_raw) ) ) {
warn( "Couldn't read init" );
return 0;
}
return nbd_check_hello( &init_raw, out_size );
}
void nbd_hello_to_buf( struct nbd_init_raw *buf, off64_t out_size )
{
struct nbd_init init;
memcpy( &init.passwd, INIT_PASSWD, 8 );
init.magic = INIT_MAGIC;
init.size = out_size;
memset( buf, 0, sizeof( struct nbd_init_raw ) ); // ensure reserved is 0s
nbd_h2r_init( &init, buf );
return;
}
int socket_nbd_write_hello(int fd, off64_t out_size)
{
struct nbd_init_raw init_raw;
nbd_hello_to_buf( &init_raw, out_size );
if ( 0 > writeloop( fd, &init_raw, sizeof( init_raw ) ) ) {
warn( SHOW_ERRNO( "failed to write hello to socket" ) );
return 0;
}
return 1;
} }
void fill_request(struct nbd_request *request, int type, off64_t from, int len) void fill_request(struct nbd_request *request, int type, off64_t from, int len)
@@ -94,9 +139,10 @@ void wait_for_data( int fd, int timeout_secs )
FD_ZERO( &fds ); FD_ZERO( &fds );
FD_SET( fd, &fds ); FD_SET( fd, &fds );
selected = select( FD_SETSIZE,
&fds, NULL, NULL, selected = sock_try_select(
timeout_secs >=0 ? &tv : NULL ); FD_SETSIZE, &fds, NULL, NULL, timeout_secs >=0 ? &tv : NULL
);
FATAL_IF( -1 == selected, "Select failed" ); FATAL_IF( -1 == selected, "Select failed" );
ERROR_IF( 0 == selected, "Timed out waiting for reply" ); ERROR_IF( 0 == selected, "Timed out waiting for reply" );
@@ -152,18 +198,6 @@ void socket_nbd_write(int fd, off64_t from, int len, int in_fd, void* in_buf, in
} }
void socket_nbd_entrust( int fd )
{
struct nbd_request request;
struct nbd_reply reply;
fill_request( &request, REQUEST_ENTRUST, 0, 0 );
FATAL_IF_NEGATIVE( writeloop( fd, &request, sizeof( request ) ),
"Couldn't write request");
read_reply( fd, &request, &reply );
}
int socket_nbd_disconnect( int fd ) int socket_nbd_disconnect( int fd )
{ {
int success = 1; int success = 1;

View File

@@ -4,13 +4,20 @@
#include <sys/types.h> #include <sys/types.h>
#include <sys/socket.h> #include <sys/socket.h>
#include "nbdtypes.h"
int socket_connect(struct sockaddr* to, struct sockaddr* from); int socket_connect(struct sockaddr* to, struct sockaddr* from);
int socket_nbd_read_hello(int fd, off64_t * size); int socket_nbd_read_hello(int fd, off64_t * size);
int socket_nbd_write_hello(int fd, off64_t size);
void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs); void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs);
void socket_nbd_write(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs); void socket_nbd_write(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs);
void socket_nbd_entrust(int fd);
int socket_nbd_disconnect( int fd ); int socket_nbd_disconnect( int fd );
/* as you can see, we're slowly accumulating code that should really be in an
* NBD library */
void nbd_hello_to_buf( struct nbd_init_raw* buf, off64_t out_size );
int nbd_check_hello( struct nbd_init_raw* init_raw, off64_t* out_size );
#endif #endif

View File

@@ -15,12 +15,13 @@ void print_response( const char * response )
NULLCHECK( response ); NULLCHECK( response );
exit_status = atoi(response); exit_status = atoi(response);
response_text = strchr( response, ':' ) + 2; response_text = strchr( response, ':' );
NULLCHECK( response_text ); FATAL_IF_NULL( response_text,
"Error parsing server response: '%s'", response );
out = exit_status > 0 ? stderr : stdout; out = exit_status > 0 ? stderr : stdout;
fprintf(out, "%s\n", response_text ); fprintf(out, "%s\n", response_text + 2);
} }
void do_remote_command(char* command, char* socket_name, int argc, char** argv) void do_remote_command(char* command, char* socket_name, int argc, char** argv)

View File

@@ -2,6 +2,7 @@
#include "client.h" #include "client.h"
#include "nbdtypes.h" #include "nbdtypes.h"
#include "ioutil.h" #include "ioutil.h"
#include "sockutil.h"
#include "util.h" #include "util.h"
#include "bitset.h" #include "bitset.h"
#include "control.h" #include "control.h"
@@ -20,22 +21,6 @@
#include <sys/socket.h> #include <sys/socket.h>
#include <netinet/tcp.h> #include <netinet/tcp.h>
static inline void* sockaddr_address_data(struct sockaddr* sockaddr)
{
NULLCHECK( sockaddr );
struct sockaddr_in* in = (struct sockaddr_in*) sockaddr;
struct sockaddr_in6* in6 = (struct sockaddr_in6*) sockaddr;
if (sockaddr->sa_family == AF_INET) {
return &in->sin_addr;
}
if (sockaddr->sa_family == AF_INET6) {
return &in6->sin6_addr;
}
return NULL;
}
struct server * server_create ( struct server * server_create (
struct flexnbd * flexnbd, struct flexnbd * flexnbd,
char* s_ip_address, char* s_ip_address,
@@ -45,16 +30,20 @@ struct server * server_create (
int acl_entries, int acl_entries,
char** s_acl_entries, char** s_acl_entries,
int max_nbd_clients, int max_nbd_clients,
int has_control) int use_killswitch,
int success)
{ {
NULLCHECK( flexnbd ); NULLCHECK( flexnbd );
struct server * out; struct server * out;
out = xmalloc( sizeof( struct server ) ); out = xmalloc( sizeof( struct server ) );
out->flexnbd = flexnbd; out->flexnbd = flexnbd;
out->has_control = has_control; out->success = success;
out->max_nbd_clients = max_nbd_clients; out->max_nbd_clients = max_nbd_clients;
out->nbd_client = xmalloc( max_nbd_clients * sizeof( struct client_tbl_entry ) ); out->use_killswitch = use_killswitch;
server_allow_new_clients( out );
out->nbd_client = xmalloc( max_nbd_clients * sizeof( struct client_tbl_entry ) );
out->tcp_backlog = 10; /* does this need to be settable? */ out->tcp_backlog = 10; /* does this need to be settable? */
FATAL_IF_NULL(s_ip_address, "No IP address supplied"); FATAL_IF_NULL(s_ip_address, "No IP address supplied");
@@ -77,12 +66,11 @@ struct server * server_create (
parse_port( s_port, &out->bind_to.v4 ); parse_port( s_port, &out->bind_to.v4 );
out->filename = s_file; out->filename = s_file;
out->filename_incomplete = xmalloc(strlen(s_file)+11+1);
strcpy(out->filename_incomplete, s_file);
strcpy(out->filename_incomplete + strlen(s_file), ".INCOMPLETE");
out->l_io = flexthread_mutex_create();
out->l_acl = flexthread_mutex_create(); out->l_acl = flexthread_mutex_create();
out->l_start_mirror = flexthread_mutex_create();
out->mirror_can_start = 1;
out->close_signal = self_pipe_create(); out->close_signal = self_pipe_create();
out->acl_updated_signal = self_pipe_create(); out->acl_updated_signal = self_pipe_create();
@@ -100,28 +88,29 @@ void server_destroy( struct server * serve )
self_pipe_destroy( serve->close_signal ); self_pipe_destroy( serve->close_signal );
serve->close_signal = NULL; serve->close_signal = NULL;
flexthread_mutex_destroy( serve->l_start_mirror );
flexthread_mutex_destroy( serve->l_acl ); flexthread_mutex_destroy( serve->l_acl );
flexthread_mutex_destroy( serve->l_io );
if ( serve->acl ) { if ( serve->acl ) {
acl_destroy( serve->acl ); acl_destroy( serve->acl );
serve->acl = NULL; serve->acl = NULL;
} }
free( serve->filename_incomplete );
free( serve->nbd_client ); free( serve->nbd_client );
free( serve ); free( serve );
} }
void server_dirty(struct server *serve, off64_t from, int len) void server_unlink( struct server * serve )
{ {
NULLCHECK( serve ); NULLCHECK( serve );
NULLCHECK( serve->filename );
FATAL_IF_NEGATIVE( unlink( serve->filename ),
"Failed to unlink %s: %s",
serve->filename,
strerror( errno ) );
if (serve->mirror) {
bitset_set_range(serve->mirror->dirty_map, from, len);
}
} }
#define SERVER_LOCK( s, f, msg ) \ #define SERVER_LOCK( s, f, msg ) \
@@ -131,30 +120,6 @@ void server_dirty(struct server *serve, off64_t from, int len)
do { NULLCHECK( s ); \ do { NULLCHECK( s ); \
FATAL_IF( 0 != flexthread_mutex_unlock( s->f ), msg ); } while (0) FATAL_IF( 0 != flexthread_mutex_unlock( s->f ), msg ); } while (0)
void server_lock_io( struct server * serve)
{
debug("IO locking");
SERVER_LOCK( serve, l_io, "Problem with I/O lock" );
}
void server_unlock_io( struct server* serve )
{
debug("IO unlocking");
SERVER_UNLOCK( serve, l_io, "Problem with I/O unlock" );
}
/* This is only to be called from error handlers. */
int server_io_locked( struct server * serve )
{
NULLCHECK( serve );
return flexthread_mutex_held( serve->l_io );
}
void server_lock_acl( struct server *serve ) void server_lock_acl( struct server *serve )
{ {
debug("ACL locking"); debug("ACL locking");
@@ -164,6 +129,8 @@ void server_lock_acl( struct server *serve )
void server_unlock_acl( struct server *serve ) void server_unlock_acl( struct server *serve )
{ {
debug( "ACL unlocking" );
SERVER_UNLOCK( serve, l_acl, "Problem with ACL unlock" ); SERVER_UNLOCK( serve, l_acl, "Problem with ACL unlock" );
} }
@@ -175,6 +142,26 @@ int server_acl_locked( struct server * serve )
} }
void server_lock_start_mirror( struct server *serve )
{
debug("Mirror start locking");
SERVER_LOCK( serve, l_start_mirror, "Problem with start mirror lock" );
}
void server_unlock_start_mirror( struct server *serve )
{
debug("Mirror start unlocking");
SERVER_UNLOCK( serve, l_start_mirror, "Problem with start mirror unlock" );
}
int server_start_mirror_locked( struct server * serve )
{
NULLCHECK( serve );
return flexthread_mutex_held( serve->l_start_mirror );
}
/** Return the actual port the server bound to. This is used because we /** Return the actual port the server bound to. This is used because we
* are allowed to pass "0" on the command-line. * are allowed to pass "0" on the command-line.
*/ */
@@ -192,74 +179,15 @@ int server_port( struct server * server )
} }
/* Try to bind to our serving socket, retrying until it works or gives a
* fatal error. */
void serve_bind( struct server * serve )
{
int bind_result;
char s_address[64];
memset( s_address, 0, 64 );
strcpy( s_address, "???" );
inet_ntop( serve->bind_to.generic.sa_family,
sockaddr_address_data( &serve->bind_to.generic),
s_address, 64 );
do {
bind_result = bind(
serve->server_fd,
&serve->bind_to.generic,
sizeof(serve->bind_to));
if ( 0 == bind_result ) {
info( "Bound to %s port %d",
s_address,
ntohs(serve->bind_to.v4.sin_port));
break;
}
else {
warn( "Couldn't bind to %s port %d: %s",
s_address,
ntohs(serve->bind_to.v4.sin_port),
strerror( errno ) );
switch (errno){
/* bind() can give us EACCES,
* EADDRINUSE, EADDRNOTAVAIL, EBADF,
* EINVAL or ENOTSOCK.
*
* Any of these other than EACCES,
* EADDRINUSE or EADDRNOTAVAIL signify
* that there's a logic error somewhere.
*/
case EACCES:
case EADDRINUSE:
case EADDRNOTAVAIL:
debug("retrying");
sleep(1);
continue;
default:
fatal( "Giving up" );
}
}
} while ( 1 );
}
/** Prepares a listening socket for the NBD server, binding etc. */ /** Prepares a listening socket for the NBD server, binding etc. */
void serve_open_server_socket(struct server* params) void serve_open_server_socket(struct server* params)
{ {
NULLCHECK( params ); NULLCHECK( params );
int optval=1;
params->server_fd = socket(params->bind_to.generic.sa_family == AF_INET ? params->server_fd = socket(params->bind_to.generic.sa_family == AF_INET ?
PF_INET : PF_INET6, SOCK_STREAM, 0); PF_INET : PF_INET6, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(params->server_fd, FATAL_IF_NEGATIVE( params->server_fd, "Couldn't create server socket" );
"Couldn't create server socket");
/* We need SO_REUSEADDR so that when we switch from listening to /* We need SO_REUSEADDR so that when we switch from listening to
* serving we don't have to change address if we don't want to. * serving we don't have to change address if we don't want to.
@@ -270,8 +198,7 @@ void serve_open_server_socket(struct server* params)
* we barf. * we barf.
*/ */
FATAL_IF_NEGATIVE( FATAL_IF_NEGATIVE(
setsockopt(params->server_fd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)), sock_set_reuseaddr( params->server_fd, 1 ), "Couldn't set SO_REUSEADDR"
"Couldn't set SO_REUSEADDR"
); );
/* TCP_NODELAY makes everything not be slow. If we can't set /* TCP_NODELAY makes everything not be slow. If we can't set
@@ -279,14 +206,16 @@ void serve_open_server_socket(struct server* params)
* understand. * understand.
*/ */
FATAL_IF_NEGATIVE( FATAL_IF_NEGATIVE(
setsockopt(params->server_fd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval)), sock_set_tcp_nodelay( params->server_fd, 1 ), "Couldn't set TCP_NODELAY"
"Couldn't set TCP_NODELAY"
); );
/* If we can't bind, presumably that's because someone else is /* If we can't bind, presumably that's because someone else is
* squatting on our ip/port combo, or the ip isn't yet * squatting on our ip/port combo, or the ip isn't yet
* configured. Ideally we want to retry this. */ * configured. Ideally we want to retry this. */
serve_bind(params); FATAL_UNLESS_ZERO(
sock_try_bind( params->server_fd, &params->bind_to.generic ),
SHOW_ERRNO( "Failed to bind() socket" )
);
FATAL_IF_NEGATIVE( FATAL_IF_NEGATIVE(
listen(params->server_fd, params->tcp_backlog), listen(params->server_fd, params->tcp_backlog),
@@ -307,26 +236,23 @@ int tryjoin_client_thread( struct client_tbl_entry *entry, int (*joinfunc)(pthre
int join_errno; int join_errno;
if (entry->thread != 0) { if (entry->thread != 0) {
char s_client_address[64]; char s_client_address[128];
memset(s_client_address, 0, 64); sockaddr_address_string( &entry->address.generic, &s_client_address[0], 128 );
strcpy(s_client_address, "???");
inet_ntop( entry->address.generic.sa_family,
sockaddr_address_data(&entry->address.generic),
s_client_address,
64 );
debug( "%s(%p,...)", joinfunc == pthread_join ? "joining" : "tryjoining", entry->thread ); debug( "%s(%p,...)", joinfunc == pthread_join ? "joining" : "tryjoining", entry->thread );
join_errno = joinfunc(entry->thread, &status); join_errno = joinfunc(entry->thread, &status);
/* join_errno can legitimately be ESRCH if the thread is /* join_errno can legitimately be ESRCH if the thread is
* already dead, but the client still needs tidying up. */ * already dead, but the client still needs tidying up. */
if (join_errno != 0 && !entry->client->stopped ) { if (join_errno != 0 && !entry->client->stopped ) {
debug( "join_errno was %s, stopped was %d", strerror( join_errno ), entry->client->stopped );
FATAL_UNLESS( join_errno == EBUSY, FATAL_UNLESS( join_errno == EBUSY,
"Problem with joining thread %p: %s", "Problem with joining thread %p: %s",
entry->thread, entry->thread,
strerror(join_errno) ); strerror(join_errno) );
} }
else { else if ( join_errno == 0 ) {
debug("nbd thread %016x exited (%s) with status %ld", debug("nbd thread %016x exited (%s) with status %ld",
entry->thread, entry->thread,
s_client_address, s_client_address,
@@ -401,6 +327,20 @@ int cleanup_and_find_client_slot(struct server* params)
return slot; return slot;
} }
int server_count_clients( struct server *params )
{
NULLCHECK( params );
int i, count = 0;
for ( i = 0 ; i < params->max_nbd_clients ; i++ ) {
if ( params->nbd_client[i].thread != 0 ) {
count++;
}
}
return count;
}
/** Check whether the address client_address is allowed or not according /** Check whether the address client_address is allowed or not according
* to the current acl. If params->acl is NULL, the result will be 1, * to the current acl. If params->acl is NULL, the result will be 1,
@@ -435,9 +375,11 @@ int server_should_accept_client(
NULLCHECK( client_address ); NULLCHECK( client_address );
NULLCHECK( s_client_address ); NULLCHECK( s_client_address );
if (inet_ntop(client_address->generic.sa_family, const char* result = sockaddr_address_string(
sockaddr_address_data(&client_address->generic), &client_address->generic, s_client_address, s_client_address_len
s_client_address, s_client_address_len ) == NULL) { );
if ( NULL == result ) {
warn( "Rejecting client %s: Bad client_address", s_client_address ); warn( "Rejecting client %s: Bad client_address", s_client_address );
return 0; return 0;
} }
@@ -483,18 +425,22 @@ void accept_nbd_client(
if ( !server_should_accept_client( params, client_address, s_client_address, 64 ) ) { if ( !server_should_accept_client( params, client_address, s_client_address, 64 ) ) {
close( client_fd ); FATAL_IF_NEGATIVE( close( client_fd ),
"Error closing client socket fd %d", client_fd );
debug("Closed client socket fd %d", client_fd);
return; return;
} }
slot = cleanup_and_find_client_slot(params); slot = cleanup_and_find_client_slot(params);
if (slot < 0) { if (slot < 0) {
warn("too many clients to accept connection"); warn("too many clients to accept connection");
close(client_fd); FATAL_IF_NEGATIVE( close( client_fd ),
"Error closing client socket fd %d", client_fd );
debug("Closed client socket fd %d", client_fd);
return; return;
} }
debug( "Client %s accepted.", s_client_address ); info( "Client %s accepted on fd %d.", s_client_address, client_fd );
client_params = client_create( params, client_fd ); client_params = client_create( params, client_fd );
params->nbd_client[slot].client = client_params; params->nbd_client[slot].client = client_params;
@@ -506,7 +452,9 @@ void accept_nbd_client(
if ( 0 != spawn_client_thread( client_params, thread ) ) { if ( 0 != spawn_client_thread( client_params, thread ) ) {
debug( "Thread creation problem." ); debug( "Thread creation problem." );
client_destroy( client_params ); client_destroy( client_params );
close(client_fd); FATAL_IF_NEGATIVE( close(client_fd),
"Error closing client socket fd %d", client_fd );
debug("Closed client socket fd %d", client_fd);
return; return;
} }
@@ -526,7 +474,7 @@ void server_audit_clients( struct server * serve)
* won't have been audited against the later acl. This isn't a * won't have been audited against the later acl. This isn't a
* problem though, because in order to update the acl * problem though, because in order to update the acl
* server_replace_acl must have been called, so the * server_replace_acl must have been called, so the
* server_accept ioop will see a second acl_updated signal as * server_accept loop will see a second acl_updated signal as
* soon as it hits select, and a second audit will be run. * soon as it hits select, and a second audit will be run.
*/ */
for( i = 0; i < serve->max_nbd_clients; i++ ) { for( i = 0; i < serve->max_nbd_clients; i++ ) {
@@ -551,7 +499,7 @@ void server_close_clients( struct server *params )
info("closing all clients"); info("closing all clients");
int i, j; int i; /* , j; */
struct client_tbl_entry *entry; struct client_tbl_entry *entry;
for( i = 0; i < params->max_nbd_clients; i++ ) { for( i = 0; i < params->max_nbd_clients; i++ ) {
@@ -562,9 +510,17 @@ void server_close_clients( struct server *params )
client_signal_stop( entry->client ); client_signal_stop( entry->client );
} }
} }
for( j = 0; j < params->max_nbd_clients; j++ ) { /* We don't join the clients here. When we enter the final
join_client_thread( &params->nbd_client[j] ); * mirror pass, we get the IO lock, then wait for the server_fd
} * to close before sending the data, to be sure that no new
* clients can be accepted which might think they've written
* to the disc. However, an existing client thread can be
* waiting for the IO lock already, so if we try to join it
* here, we deadlock.
*
* The client threads will be joined in serve_cleanup.
*
*/
} }
@@ -592,6 +548,50 @@ void server_replace_acl( struct server *serve, struct acl * new_acl )
} }
void server_prevent_mirror_start( struct server *serve )
{
NULLCHECK( serve );
serve->mirror_can_start = 0;
}
void server_allow_mirror_start( struct server *serve )
{
NULLCHECK( serve );
serve->mirror_can_start = 1;
}
/* Only call this with the mirror start lock held */
int server_mirror_can_start( struct server *serve )
{
NULLCHECK( serve );
return serve->mirror_can_start;
}
/* Queries to see if we are currently mirroring. If we are, we need
* to communicate that via the process exit status. because otherwise
* the supervisor will assume the migration completed.
*/
int serve_shutdown_is_graceful( struct server *params )
{
int is_mirroring = 0;
server_lock_start_mirror( params );
{
if ( server_is_mirroring( params ) ) {
is_mirroring = 1;
warn( "Stop signal received while mirroring." );
server_prevent_mirror_start( params );
}
}
server_unlock_start_mirror( params );
return !is_mirroring;
}
/** Accept either an NBD or control socket connection, dispatch appropriately */ /** Accept either an NBD or control socket connection, dispatch appropriately */
int server_accept( struct server * params ) int server_accept( struct server * params )
@@ -605,6 +605,7 @@ int server_accept( struct server * params )
/* We select on this fd to receive OS signals (only a few of /* We select on this fd to receive OS signals (only a few of
* which we're interested in, see flexnbd.c */ * which we're interested in, see flexnbd.c */
int signal_fd = flexnbd_signal_fd( params->flexnbd ); int signal_fd = flexnbd_signal_fd( params->flexnbd );
int should_continue = 1;
FD_ZERO(&fds); FD_ZERO(&fds);
FD_SET(params->server_fd, &fds); FD_SET(params->server_fd, &fds);
@@ -612,18 +613,22 @@ int server_accept( struct server * params )
self_pipe_fd_set( params->close_signal, &fds ); self_pipe_fd_set( params->close_signal, &fds );
self_pipe_fd_set( params->acl_updated_signal, &fds ); self_pipe_fd_set( params->acl_updated_signal, &fds );
FATAL_IF_NEGATIVE(select(FD_SETSIZE, &fds, FATAL_IF_NEGATIVE(
NULL, NULL, NULL), "select() failed"); sock_try_select(FD_SETSIZE, &fds, NULL, NULL, NULL),
SHOW_ERRNO( "select() failed" )
);
if ( self_pipe_fd_isset( params->close_signal, &fds ) ){ if ( self_pipe_fd_isset( params->close_signal, &fds ) ){
server_close_clients( params ); server_close_clients( params );
return 0; should_continue = 0;
} }
if ( 0 < signal_fd && FD_ISSET( signal_fd, &fds ) ){ if ( 0 < signal_fd && FD_ISSET( signal_fd, &fds ) ){
debug( "Stop signal received." ); debug( "Stop signal received." );
server_close_clients( params ); server_close_clients( params );
return 0; params->success = params->success && serve_shutdown_is_graceful( params );
should_continue = 0;
} }
@@ -634,11 +639,17 @@ int server_accept( struct server * params )
if ( FD_ISSET( params->server_fd, &fds ) ){ if ( FD_ISSET( params->server_fd, &fds ) ){
client_fd = accept( params->server_fd, &client_address.generic, &socklen ); client_fd = accept( params->server_fd, &client_address.generic, &socklen );
debug("Accepted nbd client socket");
if ( params->allow_new_clients ) {
debug("Accepted nbd client socket fd %d", client_fd);
accept_nbd_client(params, client_fd, &client_address); accept_nbd_client(params, client_fd, &client_address);
} else {
debug( "New NBD client socket %d not allowed", client_fd );
sock_try_close( client_fd );
}
} }
return 1; return should_continue;
} }
@@ -648,12 +659,47 @@ void serve_accept_loop(struct server* params)
while( server_accept( params ) ); while( server_accept( params ) );
} }
void* build_allocation_map_thread(void* serve_uncast)
{
NULLCHECK( serve_uncast );
struct server* serve = (struct server*) serve_uncast;
NULLCHECK( serve->filename );
NULLCHECK( serve->allocation_map );
int fd = open( serve->filename, O_RDONLY );
FATAL_IF_NEGATIVE( fd, "Couldn't open %s", serve->filename );
if ( build_allocation_map( serve->allocation_map, fd ) ) {
serve->allocation_map_built = 1;
}
else {
/* We can operate without it, but we can't free it without a race.
* All that happens if we leave it is that it gradually builds up an
* *incomplete* record of writes. Nobody will use it, as
* allocation_map_built == 0 for the lifetime of the process.
*
* The stream functionality can still be relied on. We don't need to
* worry about mirroring waiting for the allocation map to finish,
* because we already copy every byte at least once. If that changes in
* the future, we'll need to wait for the allocation map to finish or
* fail before we can complete the migration.
*/
warn( "Didn't build allocation map for %s", serve->filename );
}
close( fd );
return NULL;
}
/** Initialisation function that sets up the initial allocation map, i.e. so /** Initialisation function that sets up the initial allocation map, i.e. so
* we know which blocks of the file are allocated. * we know which blocks of the file are allocated.
*/ */
void serve_init_allocation_map(struct server* params) void serve_init_allocation_map(struct server* params)
{ {
NULLCHECK( params ); NULLCHECK( params );
NULLCHECK( params->filename );
int fd = open( params->filename, O_RDONLY ); int fd = open( params->filename, O_RDONLY );
off64_t size; off64_t size;
@@ -663,12 +709,52 @@ void serve_init_allocation_map(struct server* params)
params->size = size; params->size = size;
FATAL_IF_NEGATIVE( size, "Couldn't find size of %s", FATAL_IF_NEGATIVE( size, "Couldn't find size of %s",
params->filename ); params->filename );
params->allocation_map = params->allocation_map =
build_allocation_map(fd, size, block_allocation_resolution); bitset_alloc( params->size, block_allocation_resolution );
close(fd);
int ok = pthread_create( &params->allocation_map_builder_thread,
NULL,
build_allocation_map_thread,
params );
FATAL_IF_NEGATIVE( ok, "Couldn't create thread" );
} }
void server_forbid_new_clients( struct server * serve )
{
serve->allow_new_clients = 0;
return;
}
void server_allow_new_clients( struct server * serve )
{
serve->allow_new_clients = 1;
return;
}
void server_join_clients( struct server * serve ) {
int i;
void* status;
for (i=0; i < serve->max_nbd_clients; i++) {
pthread_t thread_id = serve->nbd_client[i].thread;
int err = 0;
if (thread_id != 0) {
debug( "joining thread %p", thread_id );
if ( 0 == (err = pthread_join( thread_id, &status ) ) ) {
serve->nbd_client[i].thread = 0;
} else {
warn( "Error %s (%i) joining thread %p", strerror( err ), err, thread_id );
}
}
}
return;
}
/* Tell the server to close all the things. */ /* Tell the server to close all the things. */
void serve_signal_close( struct server * serve ) void serve_signal_close( struct server * serve )
{ {
@@ -677,6 +763,7 @@ void serve_signal_close( struct server * serve )
self_pipe_signal( serve->close_signal ); self_pipe_signal( serve->close_signal );
} }
/* Block until the server closes the server_fd. /* Block until the server closes the server_fd.
*/ */
void serve_wait_for_close( struct server * serve ) void serve_wait_for_close( struct server * serve )
@@ -686,55 +773,73 @@ void serve_wait_for_close( struct server * serve )
} }
} }
/* We've just had an ENTRUST/DISCONNECT pair, so we need to shut down /* We've just had an DISCONNECT pair, so we need to shut down
* and signal our listener that we can safely take over. * and signal our listener that we can safely take over.
*/ */
void server_control_arrived( struct server *serve ) void server_control_arrived( struct server *serve )
{ {
debug( "server_control_arrived" );
NULLCHECK( serve ); NULLCHECK( serve );
serve->has_control = 1; if ( !serve->success ) {
serve->success = 1;
serve_signal_close( serve ); serve_signal_close( serve );
} }
}
void flexnbd_stop_control( struct flexnbd * flexnbd );
/** Closes sockets, frees memory and waits for all client threads to finish */ /** Closes sockets, frees memory and waits for all client threads to finish */
void serve_cleanup(struct server* params, void serve_cleanup(struct server* params,
int fatal __attribute__ ((unused)) ) int fatal __attribute__ ((unused)) )
{ {
NULLCHECK( params ); NULLCHECK( params );
void* status;
info("cleaning up"); info("cleaning up");
int i;
if (params->server_fd){ close(params->server_fd); } if (params->server_fd){ close(params->server_fd); }
/* need to stop background build if we're killed very early on */
pthread_cancel(params->allocation_map_builder_thread);
pthread_join(params->allocation_map_builder_thread, &status);
int need_mirror_lock;
need_mirror_lock = !server_start_mirror_locked( params );
if ( need_mirror_lock ) { server_lock_start_mirror( params ); }
{
if ( server_is_mirroring( params ) ) {
server_abandon_mirror( params );
}
server_prevent_mirror_start( params );
}
if ( need_mirror_lock ) { server_unlock_start_mirror( params ); }
server_join_clients( params );
if (params->allocation_map) { if (params->allocation_map) {
free(params->allocation_map); bitset_free( params->allocation_map );
} }
if (params->mirror_super) { if ( server_start_mirror_locked( params ) ) {
/* AWOOGA! RACE! */ server_unlock_start_mirror( params );
pthread_t mirror_t = params->mirror_super->thread;
params->mirror->signal_abandon = 1;
pthread_join( mirror_t, NULL );
}
for (i=0; i < params->max_nbd_clients; i++) {
void* status;
pthread_t thread_id = params->nbd_client[i].thread;
if (thread_id != 0) {
debug("joining thread %p", thread_id);
pthread_join(thread_id, &status);
}
} }
if ( server_acl_locked( params ) ) { if ( server_acl_locked( params ) ) {
server_unlock_acl( params ); server_unlock_acl( params );
} }
/* if( params->flexnbd ) { */
/* if ( params->flexnbd->control ) { */
/* flexnbd_stop_control( params->flexnbd ); */
/* } */
/* flexnbd_destroy( params->flexnbd ); */
/* } */
/* server_destroy( params ); */
debug( "Cleanup done"); debug( "Cleanup done");
} }
@@ -742,7 +847,71 @@ void serve_cleanup(struct server* params,
int server_is_in_control( struct server *serve ) int server_is_in_control( struct server *serve )
{ {
NULLCHECK( serve ); NULLCHECK( serve );
return serve->has_control; return serve->success;
}
int server_is_mirroring( struct server * serve )
{
NULLCHECK( serve );
return !!serve->mirror_super;
}
uint64_t server_mirror_bytes_remaining( struct server * serve )
{
if ( server_is_mirroring( serve ) ) {
uint64_t bytes_to_xfer =
bitset_stream_queued_bytes( serve->allocation_map, BITSET_STREAM_SET ) +
( serve->size - serve->mirror->offset );
return bytes_to_xfer;
}
return 0;
}
/* Given historic bps measurements and number of bytes left to transfer, give
* an estimate of how many seconds are remaining before the migration is
* complete, assuming no new bytes are written.
*/
uint64_t server_mirror_eta( struct server * serve )
{
if ( server_is_mirroring( serve ) ) {
uint64_t bytes_to_xfer = server_mirror_bytes_remaining( serve );
return bytes_to_xfer / ( mirror_current_bps( serve->mirror ) + 1 );
}
return 0;
}
void mirror_super_destroy( struct mirror_super * super );
/* This must only be called with the start_mirror lock held */
void server_abandon_mirror( struct server * serve )
{
NULLCHECK( serve );
if ( serve->mirror_super ) {
/* FIXME: AWOOGA! RACE!
* We can set abandon_signal after mirror_super has checked it, but
* before the reset. However, mirror_reset doesn't clear abandon_signal
* so it'll just terminate early on the next pass. */
ERROR_UNLESS(
self_pipe_signal( serve->mirror->abandon_signal ),
"Failed to signal abandon to mirror"
);
pthread_t tid = serve->mirror_super->thread;
pthread_join( tid, NULL );
debug( "Mirror thread %p pthread_join returned", tid );
server_allow_mirror_start( serve );
mirror_super_destroy( serve->mirror_super );
serve->mirror = NULL;
serve->mirror_super = NULL;
debug( "Mirror supervisor done." );
}
} }
int server_default_deny( struct server * serve ) int server_default_deny( struct server * serve )
@@ -752,19 +921,24 @@ int server_default_deny( struct server * serve )
} }
/** Full lifecycle of the server */ /** Full lifecycle of the server */
int do_serve(struct server* params) int do_serve( struct server* params, struct self_pipe * open_signal )
{ {
NULLCHECK( params ); NULLCHECK( params );
int has_control; int success;
error_set_handler((cleanup_handler*) serve_cleanup, params); error_set_handler((cleanup_handler*) serve_cleanup, params);
serve_open_server_socket(params); serve_open_server_socket(params);
/* Only signal that we are open for business once the server
socket is open */
if ( NULL != open_signal ) { self_pipe_signal( open_signal ); }
serve_init_allocation_map(params); serve_init_allocation_map(params);
serve_accept_loop(params); serve_accept_loop(params);
has_control = params->has_control; success = params->success;
serve_cleanup(params, 0); serve_cleanup(params, 0);
return has_control; return success;
} }

View File

@@ -3,6 +3,7 @@
#include <sys/types.h> #include <sys/types.h>
#include <unistd.h> #include <unistd.h>
#include <signal.h> /* for sig_atomic_t */
#include "flexnbd.h" #include "flexnbd.h"
#include "parse.h" #include "parse.h"
@@ -28,8 +29,6 @@ struct server {
union mysockaddr bind_to; union mysockaddr bind_to;
/** (static) file name to serve */ /** (static) file name to serve */
char* filename; char* filename;
/** file name of INCOMPLETE flag */
char* filename_incomplete;
/** TCP backlog for listen() */ /** TCP backlog for listen() */
int tcp_backlog; int tcp_backlog;
/** (static) file name of UNIX control socket (or NULL if none) */ /** (static) file name of UNIX control socket (or NULL if none) */
@@ -37,9 +36,6 @@ struct server {
/** size of file */ /** size of file */
uint64_t size; uint64_t size;
/** Claims around any I/O to this file */
struct flexthread_mutex * l_io;
/** to interrupt accept loop and clients, write() to close_signal[1] */ /** to interrupt accept loop and clients, write() to close_signal[1] */
struct self_pipe * close_signal; struct self_pipe * close_signal;
@@ -53,22 +49,53 @@ struct server {
/* Claimed around any updates to the ACL. */ /* Claimed around any updates to the ACL. */
struct flexthread_mutex * l_acl; struct flexthread_mutex * l_acl;
/* Claimed around starting a mirror so that it doesn't race with
* shutting down on a SIGTERM. */
struct flexthread_mutex * l_start_mirror;
struct mirror* mirror; struct mirror* mirror;
struct mirror_super * mirror_super; struct mirror_super * mirror_super;
/* This is used to stop the mirror from starting after we
* receive a SIGTERM */
int mirror_can_start;
int server_fd; int server_fd;
int control_fd; int control_fd;
struct bitset_mapping* allocation_map; /* the allocation_map keeps track of which blocks in the backing file
* have been allocated, or part-allocated on disc, with unallocated
* blocks presumed to contain zeroes (i.e. represented as sparse files
* by the filesystem). We can use this information when receiving
* incoming writes, and avoid writing zeroes to unallocated sections
* of the file which would needlessly increase disc usage. This
* bitmap will start at all-zeroes for an empty file, and tend towards
* all-ones as the file is written to (i.e. we assume that allocated
* blocks can never become unallocated again, as is the case with ext3
* at least).
*/
struct bitset * allocation_map;
/* when starting up, this thread builds the allocation_map */
pthread_t allocation_map_builder_thread;
/* when the thread has finished, it sets this to 1 */
volatile sig_atomic_t allocation_map_built;
int max_nbd_clients; int max_nbd_clients;
struct client_tbl_entry *nbd_client; struct client_tbl_entry *nbd_client;
/** Should clients use the killswitch? */
int use_killswitch;
/** If this isn't set, newly accepted clients will be closed immediately */
int allow_new_clients;
/* Marker for whether this server has control over the data in /* Marker for whether this server has control over the data in
* the file, or if we're waiting to receive it from an inbound * the file, or if we're waiting to receive it from an inbound
* migration which hasn't yet finished. * migration which hasn't yet finished.
*
* It's the value which controls the exit status of a serve or
* listen process.
*/ */
int has_control; int success;
}; };
struct server * server_create( struct server * server_create(
@@ -80,25 +107,46 @@ struct server * server_create(
int acl_entries, int acl_entries,
char** s_acl_entries, char** s_acl_entries,
int max_nbd_clients, int max_nbd_clients,
int has_control ); int use_killswitch,
int success );
void server_destroy( struct server * ); void server_destroy( struct server * );
int server_is_closed(struct server* serve); int server_is_closed(struct server* serve);
void server_dirty(struct server *serve, off64_t from, int len);
void server_lock_io( struct server * serve);
void server_unlock_io( struct server* serve );
void serve_signal_close( struct server *serve ); void serve_signal_close( struct server *serve );
void serve_wait_for_close( struct server * serve ); void serve_wait_for_close( struct server * serve );
void server_replace_acl( struct server *serve, struct acl * acl); void server_replace_acl( struct server *serve, struct acl * acl);
void server_control_arrived( struct server *serve ); void server_control_arrived( struct server *serve );
int server_is_in_control( struct server *serve ); int server_is_in_control( struct server *serve );
int server_default_deny( struct server * serve ); int server_default_deny( struct server * serve );
int server_io_locked( struct server * serve );
int server_acl_locked( struct server * serve ); int server_acl_locked( struct server * serve );
void server_lock_acl( struct server *serve ); void server_lock_acl( struct server *serve );
void server_unlock_acl( struct server *serve ); void server_unlock_acl( struct server *serve );
void server_lock_start_mirror( struct server *serve );
void server_unlock_start_mirror( struct server *serve );
int server_is_mirroring( struct server * serve );
uint64_t server_mirror_bytes_remaining( struct server * serve );
uint64_t server_mirror_eta( struct server * serve );
int do_serve( struct server * ); void server_abandon_mirror( struct server * serve );
void server_prevent_mirror_start( struct server *serve );
void server_allow_mirror_start( struct server *serve );
int server_mirror_can_start( struct server *serve );
/* These three functions are used by mirror around the final pass, to close
* existing clients and prevent new ones from being around
*/
void server_forbid_new_clients( struct server *serve );
void server_close_clients( struct server *serve );
void server_join_clients( struct server *serve );
void server_allow_new_clients( struct server *serve );
/* Returns a count (ish) of the number of currently-running client threads */
int server_count_clients( struct server *params );
void server_unlink( struct server * serve );
int do_serve( struct server *, struct self_pipe * );
struct mode_readwrite_params { struct mode_readwrite_params {
union mysockaddr connect_to; union mysockaddr connect_to;

249
src/sockutil.c Normal file
View File

@@ -0,0 +1,249 @@
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <netinet/tcp.h>
#include <sys/un.h>
#include "sockutil.h"
#include "util.h"
size_t sockaddr_size( const struct sockaddr* sa )
{
struct sockaddr_un* un = (struct sockaddr_un*) sa;
size_t ret = 0;
switch( sa->sa_family ) {
case AF_INET:
ret = sizeof( struct sockaddr_in );
break;
case AF_INET6:
ret = sizeof( struct sockaddr_in6 );
break;
case AF_UNIX:
ret = sizeof( un->sun_family ) + SUN_LEN( un );
break;
}
return ret;
}
const char* sockaddr_address_string( const struct sockaddr* sa, char* dest, size_t len )
{
NULLCHECK( sa );
NULLCHECK( dest );
struct sockaddr_in* in = ( struct sockaddr_in* ) sa;
struct sockaddr_in6* in6 = ( struct sockaddr_in6* ) sa;
struct sockaddr_un* un = ( struct sockaddr_un* ) sa;
unsigned short real_port = ntohs( in->sin_port ); // common to in and in6
size_t size;
const char* ret = NULL;
memset( dest, 0, len );
if ( sa->sa_family == AF_INET ) {
ret = inet_ntop( AF_INET, &in->sin_addr, dest, len );
} else if ( sa->sa_family == AF_INET6 ) {
ret = inet_ntop( AF_INET6, &in6->sin6_addr, dest, len );
} else if ( sa->sa_family == AF_UNIX ) {
ret = strncpy( dest, un->sun_path, SUN_LEN( un ) );
}
if ( ret == NULL ) {
strncpy( dest, "???", len );
}
if ( NULL != ret && real_port > 0 && sa->sa_family != AF_UNIX ) {
size = strlen( dest );
snprintf( dest + size, len - size, " port %d", real_port );
}
return ret;
}
int sock_set_reuseaddr( int fd, int optval )
{
return setsockopt( fd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval) );
}
/* Set the tcp_nodelay option */
int sock_set_tcp_nodelay( int fd, int optval )
{
return setsockopt( fd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval) );
}
int sock_set_nonblock( int fd, int optval )
{
int flags = fcntl( fd, F_GETFL );
if ( flags == -1 ) {
return -1;
}
if ( optval ) {
flags = flags | O_NONBLOCK;
} else {
flags = flags & (~O_NONBLOCK);
}
return fcntl( fd, F_SETFL, flags );
}
int sock_try_bind( int fd, const struct sockaddr* sa )
{
int bind_result;
char s_address[256];
int retry = 1;
sockaddr_address_string( sa, &s_address[0], 256 );
do {
bind_result = bind( fd, sa, sockaddr_size( sa ) );
if ( 0 == bind_result ) {
info( "Bound to %s", s_address );
break;
}
else {
warn( SHOW_ERRNO( "Couldn't bind to %s", s_address ) );
switch ( errno ) {
/* bind() can give us EACCES, EADDRINUSE, EADDRNOTAVAIL, EBADF,
* EINVAL, ENOTSOCK, EFAULT, ELOOP, ENAMETOOLONG, ENOENT,
* ENOMEM, ENOTDIR, EROFS
*
* Any of these other than EADDRINUSE & EADDRNOTAVAIL signify
* that there's a logic error somewhere.
*
* EADDRINUSE is fatal: if there's something already where we
* want to be listening, we have no guarantees that any clients
* will cope with it.
*/
case EADDRNOTAVAIL:
debug( "retrying" );
sleep( 1 );
continue;
case EADDRINUSE:
warn( "%s in use, giving up.", s_address );
retry = 0;
break;
default:
warn( "giving up" );
retry = 0;
}
}
} while ( retry );
return bind_result;
}
int sock_try_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout)
{
int result;
do {
result = select(nfds, readfds, writefds, exceptfds, timeout);
if ( errno != EINTR ) {
break;
}
} while ( result == -1 );
return result;
}
int sock_try_connect( int fd, struct sockaddr* to, socklen_t addrlen, int wait )
{
fd_set fds;
struct timeval tv = { wait, 0 };
int result = 0;
if ( sock_set_nonblock( fd, 1 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to set socket non-blocking for connect()" ) );
return connect( fd, to, addrlen );
}
FD_ZERO( &fds );
FD_SET( fd, &fds );
do {
result = connect( fd, to, addrlen );
if ( result == -1 ) {
switch( errno ) {
case EINPROGRESS:
result = 0;
break; /* success */
case EAGAIN:
case EINTR:
/* Try connect() again. This only breaks out of the switch,
* not the do...while loop. since result == -1, we go again.
*/
break;
default:
warn( SHOW_ERRNO( "Failed to connect()" ) );
goto out;
}
}
} while ( result == -1 );
if ( -1 == sock_try_select( FD_SETSIZE, NULL, &fds, NULL, &tv) ) {
warn( SHOW_ERRNO( "failed to select() on non-blocking connect" ) );
result = -1;
goto out;
}
if ( !FD_ISSET( fd, &fds ) ) {
result = -1;
errno = ETIMEDOUT;
goto out;
}
int scratch;
socklen_t s_size = sizeof( scratch );
if ( getsockopt( fd, SOL_SOCKET, SO_ERROR, &scratch, &s_size ) == -1 ) {
result = -1;
warn( SHOW_ERRNO( "getsockopt() failed" ) );
goto out;
}
if ( scratch == EINPROGRESS ) {
scratch = ETIMEDOUT;
}
result = scratch ? -1 : 0;
errno = scratch;
out:
if ( sock_set_nonblock( fd, 0 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to make socket blocking after connect()" ) );
return -1;
}
debug( "sock_try_connect: %i", result );
return result;
}
int sock_try_close( int fd )
{
int result;
do {
result = close( fd );
if ( result == -1 ) {
if ( EINTR == errno ) {
continue; /* retry EINTR */
} else {
warn( SHOW_ERRNO( "Failed to close() fd %i", fd ) );
break; /* Other errors get reported */
}
}
} while( 0 );
return result;
}

41
src/sockutil.h Normal file
View File

@@ -0,0 +1,41 @@
#ifndef SOCKUTIL_H
#define SOCKUTIL_H
#include <sys/time.h>
#include <sys/socket.h>
#include <sys/select.h>
/* Returns the size of the sockaddr, or 0 on error */
size_t sockaddr_size(const struct sockaddr* sa);
/* Convert a sockaddr into an address. Like inet_ntop, it returns dest if
* successful, NULL otherwise. In the latter case, dest will contain "???"
*/
const char* sockaddr_address_string(const struct sockaddr* sa, char* dest, size_t len);
/* Set the SOL_REUSEADDR otion */
int sock_set_reuseaddr(int fd, int optval);
/* Set the tcp_nodelay option */
int sock_set_tcp_nodelay(int fd, int optval);
/* TODO: Set the tcp_cork option */
// int sock_set_cork(int fd, int optval);
int sock_set_nonblock(int fd, int optval);
/* Attempt to bind the fd to the sockaddr, retrying common transient failures */
int sock_try_bind(int fd, const struct sockaddr* sa);
/* Try to call select(), retrying EINTR */
int sock_try_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
/* Try to call connect(), timing out after wait seconds */
int sock_try_connect( int fd, struct sockaddr* to, socklen_t addrlen, int wait );
/* Try to call close(), retrying EINTR */
int sock_try_close( int fd );
#endif

View File

@@ -8,20 +8,63 @@ struct status * status_create( struct server * serve )
struct status * status; struct status * status;
status = xmalloc( sizeof( struct status ) ); status = xmalloc( sizeof( struct status ) );
status->has_control = serve->has_control; status->pid = getpid();
status->size = serve->size;
status->has_control = serve->success;
status->clients_allowed = serve->allow_new_clients;
status->num_clients = server_count_clients( serve );
server_lock_start_mirror( serve );
status->is_mirroring = NULL != serve->mirror; status->is_mirroring = NULL != serve->mirror;
if ( status->is_mirroring ) {
status->migration_duration = monotonic_time_ms();
if ( ( serve->mirror->migration_started ) < status->migration_duration ) {
status->migration_duration -= serve->mirror->migration_started;
} else {
status->migration_duration = 0;
}
status->migration_duration /= 1000;
status->migration_speed = serve->mirror->all_dirty / ( status->migration_duration + 1 );
status->migration_speed_limit = serve->mirror->max_bytes_per_second;
status->migration_seconds_left = server_mirror_eta( serve );
}
server_unlock_start_mirror( serve );
return status; return status;
} }
#define BOOL_S(var) (var ? "true" : "false" ) #define BOOL_S(var) (var ? "true" : "false" )
#define PRINT_FIELD( var ) \ #define PRINT_BOOL( var ) \
do{dprintf( fd, #var "=%s ", BOOL_S( status->var ) );}while(0) do{dprintf( fd, #var "=%s ", BOOL_S( status->var ) );}while(0)
#define PRINT_INT( var ) \
do{dprintf( fd, #var "=%d ", status->var );}while(0)
#define PRINT_UINT64( var ) \
do{dprintf( fd, #var "=%"PRIu64" ", status->var );}while(0)
int status_write( struct status * status, int fd ) int status_write( struct status * status, int fd )
{ {
PRINT_FIELD( is_mirroring ); PRINT_INT( pid );
PRINT_FIELD( has_control ); PRINT_UINT64( size );
PRINT_BOOL( is_mirroring );
PRINT_BOOL( clients_allowed );
PRINT_INT( num_clients );
PRINT_BOOL( has_control );
if ( status->is_mirroring ) {
PRINT_UINT64( migration_speed );
PRINT_UINT64( migration_duration );
PRINT_UINT64( migration_seconds_left );
if ( status->migration_speed_limit < UINT64_MAX ) {
PRINT_UINT64( migration_speed_limit );
};
}
dprintf(fd, "\n"); dprintf(fd, "\n");
return 1; return 1;
} }
@@ -32,3 +75,4 @@ void status_destroy( struct status * status )
NULLCHECK( status ); NULLCHECK( status );
free( status ); free( status );
} }

View File

@@ -17,6 +17,12 @@
* *
* The following status fields are defined: * The following status fields are defined:
* *
* pid:
* The current process ID.
*
* size:
* The size of the backing file being served, in bytes.
*
* has_control: * has_control:
* This will be false when the server is listening for an incoming * This will be false when the server is listening for an incoming
* migration. It will switch to true when the end-of-migration * migration. It will switch to true when the end-of-migration
@@ -24,21 +30,60 @@
* If the server is started in "serve" mode, this will never be * If the server is started in "serve" mode, this will never be
* false. * false.
* *
* clients_allowed:
* This will be false if the server is not currently allowing new
* connections, for instance, if we're in the migration endgame.
*
* num_clients:
* This tells us how many clients are currently running. If we're in the
* migration endgame, it should be 0
*
* is_migrating: * is_migrating:
* This will be false when the server is started in either "listen" * This will be false when the server is started in either "listen"
* or "serve" mode. It will become true when a server in "serve" * or "serve" mode. It will become true when a server in "serve"
* mode starts a migration, and will become false again when the * mode starts a migration, and will become false again when the
* migration terminates, successfully or not. * migration terminates, successfully or not.
* If the server is currently in "listen" mode, this will never b * If the server is currently in "listen" mode, this will never be
* true. * true.
*
*
* If is_migrating is true, then a number of other attributes may appear,
* relating to the progress of the migration.
*
* migration_duration:
* How long the migration has been running for, in ms.
*
* migration_speed:
* Network transfer speed, in bytes/second. This only takes dirty bytes
* into account.
*
* migration_speed_limit:
* If set, the speed we're going to try to limit the migration to.
*
* migration_seconds_left:
* Our current best estimate of how many seconds are left before the migration
* migration is finished.
*
*/ */
#include "serve.h" #include "serve.h"
#include <sys/types.h>
#include <unistd.h>
struct status { struct status {
pid_t pid;
uint64_t size;
int has_control; int has_control;
int clients_allowed;
int num_clients;
int is_mirroring; int is_mirroring;
uint64_t migration_duration;
uint64_t migration_speed;
uint64_t migration_speed_limit;
uint64_t migration_seconds_left;
}; };
/** Create a status object for the given server. */ /** Create a status object for the given server. */
@@ -53,3 +98,4 @@ void status_destroy( struct status * );
#endif #endif

View File

@@ -6,6 +6,7 @@
#include <errno.h> #include <errno.h>
#include <malloc.h> #include <malloc.h>
#include <unistd.h> #include <unistd.h>
#include <time.h>
#include "util.h" #include "util.h"
@@ -50,6 +51,25 @@ void mylog(int line_level, const char* format, ...)
va_end(argptr); va_end(argptr);
} }
uint64_t monotonic_time_ms()
{
struct timespec ts;
uint64_t seconds_ms, nanoseconds_ms;
FATAL_IF_NEGATIVE(
clock_gettime(CLOCK_MONOTONIC, &ts),
SHOW_ERRNO( "clock_gettime failed" )
);
seconds_ms = ts.tv_sec;
seconds_ms = seconds_ms * 1000;
nanoseconds_ms = ts.tv_nsec;
nanoseconds_ms = nanoseconds_ms / 1000000;
return seconds_ms + nanoseconds_ms;
}
void* xrealloc(void* ptr, size_t size) void* xrealloc(void* ptr, size_t size)
{ {

View File

@@ -8,6 +8,7 @@
#include <stdlib.h> #include <stdlib.h>
#include <sys/types.h> #include <sys/types.h>
#include <unistd.h> #include <unistd.h>
#include <inttypes.h>
void* xrealloc(void* ptr, size_t size); void* xrealloc(void* ptr, size_t size);
void* xmalloc(size_t size); void* xmalloc(size_t size);
@@ -85,9 +86,13 @@ void error_handler(int fatal);
/* mylog a line at the given level (0 being most verbose) */ /* mylog a line at the given level (0 being most verbose) */
void mylog(int line_level, const char* format, ...); void mylog(int line_level, const char* format, ...);
/* Returns the current time, in milliseconds, from CLOCK_MONOTONIC */
uint64_t monotonic_time_ms(void);
#define levstr(i) (i==0?'D':(i==1?'I':(i==2?'W':(i==3?'E':'F')))) #define levstr(i) (i==0?'D':(i==1?'I':(i==2?'W':(i==3?'E':'F'))))
#define myloglev(level, msg, ...) mylog( level, "%c:%d %p %s:%d: "msg"\n", levstr(level), getpid(),pthread_self(), __FILE__, __LINE__, ##__VA_ARGS__ ) #define myloglev(level, msg, ...) mylog( level, "%"PRIu64":%c:%d %p %s:%d: "msg"\n", monotonic_time_ms(), levstr(level), getpid(),pthread_self(), __FILE__, __LINE__, ##__VA_ARGS__ )
#ifdef DEBUG #ifdef DEBUG
# define debug(msg, ...) myloglev(0, msg, ##__VA_ARGS__) # define debug(msg, ...) myloglev(0, msg, ##__VA_ARGS__)
@@ -148,6 +153,9 @@ void mylog(int line_level, const char* format, ...);
#define NULLCHECK(value) FATAL_IF_NULL(value, "BUG: " #value " is null") #define NULLCHECK(value) FATAL_IF_NULL(value, "BUG: " #value " is null")
#define SHOW_ERRNO( msg, ... ) msg ": %s (%i)", ##__VA_ARGS__, ( errno == 0 ? "EOF" : strerror(errno) ), errno
#define WARN_IF_NEGATIVE( value, msg, ... ) if ( value < 0 ) { warn( msg, ##__VA_ARGS__ ); }
#endif #endif

View File

@@ -0,0 +1,13 @@
{
avoid_glibc_bug_do_lookup
Memcheck:Addr8
fun:do_lookup_x
obj:*
fun:_dl_lookup_symbol_x
}
{
avoid_glibc_bug_check_match
Memcheck:Addr8
fun:check_match.12149
}

View File

@@ -5,7 +5,7 @@ require 'file_writer'
class Environment class Environment
attr_reader( :blocksize, :filename1, :filename2, :ip, attr_reader( :blocksize, :filename1, :filename2, :ip,
:port1, :port2, :nbd1, :nbd2, :file1, :file2, :rebind_port1 ) :port1, :port2, :nbd1, :nbd2, :file1, :file2 )
def initialize def initialize
@blocksize = 1024 @blocksize = 1024
@@ -14,15 +14,20 @@ class Environment
@ip = "127.0.0.1" @ip = "127.0.0.1"
@available_ports = [*40000..41000] - listening_ports @available_ports = [*40000..41000] - listening_ports
@port1 = @available_ports.shift @port1 = @available_ports.shift
@rebind_port1 = @available_ports.shift
@port2 = @available_ports.shift @port2 = @available_ports.shift
@rebind_port2 = @available_ports.shift @nbd1 = FlexNBD::FlexNBD.new("../../build/flexnbd", @ip, @port1)
@nbd1 = FlexNBD.new("../../build/flexnbd", @ip, @port1, @ip, @rebind_port1) @nbd2 = FlexNBD::FlexNBD.new("../../build/flexnbd", @ip, @port2)
@nbd2 = FlexNBD.new("../../build/flexnbd", @ip, @port2, @ip, @rebind_port2)
@fake_pid = nil @fake_pid = nil
end end
def proxy1(port=@port2)
@nbd1.proxy(@ip, port)
end
def proxy2(port=@port1)
@nbd2.proxy(@ip, port)
end
def serve1(*acl) def serve1(*acl)
@nbd1.serve(@filename1, *acl) @nbd1.serve(@filename1, *acl)
@@ -42,6 +47,10 @@ class Environment
end end
def break1
@nbd1.break
end
def acl1( *acl ) def acl1( *acl )
@nbd1.acl( *acl ) @nbd1.acl( *acl )
end end
@@ -69,6 +78,14 @@ class Environment
@nbd1.mirror_unchecked( @nbd2.ip, @nbd2.port, nil, nil, 10 ) @nbd1.mirror_unchecked( @nbd2.ip, @nbd2.port, nil, nil, 10 )
end end
def mirror12_unlink
@nbd1.mirror_unlink( @nbd2.ip, @nbd2.port, 2 )
end
def write1( data )
@nbd1.write( 0, data )
end
def writefile1(data) def writefile1(data)
@file1 = FileWriter.new(@filename1, @blocksize).write(data) @file1 = FileWriter.new(@filename1, @blocksize).write(data)
@@ -111,20 +128,19 @@ class Environment
end end
def run_fake( name, addr, port, rebind_addr = addr, rebind_port = port ) def run_fake( name, addr, port, sock=nil )
fakedir = File.join( File.dirname( __FILE__ ), "fakes" ) fakedir = File.join( File.dirname( __FILE__ ), "fakes" )
fake = Dir[File.join( fakedir, name ) + "*"].sort.find { |fn| fakeglob = File.join( fakedir, name ) + "*"
fake = Dir[fakeglob].sort.find { |fn|
File.executable?( fn ) File.executable?( fn )
} }
raise "no fake executable" unless fake raise "no fake executable at #{fakeglob}" unless fake
raise "no addr" unless addr raise "no addr" unless addr
raise "no port" unless port raise "no port" unless port
raise "no rebind_addr" unless rebind_addr
raise "no rebind_port" unless rebind_port
@fake_pid = fork do @fake_pid = fork do
exec [fake, addr, port, @nbd1.pid, rebind_addr, rebind_port].map{|x| x.to_s}.join(" ") exec [fake, addr, port, @nbd1.pid, sock].map{|x| x.to_s}.join(" ")
end end
sleep(0.5) sleep(0.5)
end end

View File

@@ -0,0 +1,35 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Open a server, accept a client, then cancel the migration by issuing
# a break command.
require 'flexnbd/fake_dest'
include FlexNBD
addr, port, src_pid, sock = *ARGV
server = FakeDest.new( addr, port )
client = server.accept
ctrl = UNIXSocket.open( sock )
Process.kill("STOP", src_pid.to_i)
ctrl.write( "break\n" )
ctrl.close_write
client.write_hello
Process.kill("CONT", src_pid.to_i)
fail "Unexpected control response" unless
ctrl.read =~ /0: mirror stopped/
client2 = nil
begin
client2 = server.accept( "Expected timeout" )
fail "Unexpected reconnection"
rescue Timeout::Error
# expected
end
client.close
exit(0)

View File

@@ -1,29 +0,0 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Open a server, accept a client, then we expect a single write
# followed by an entrust. Disconnect after the entrust. We expect a
# reconnection followed by a full mirror.
require 'flexnbd/fake_dest'
include FlexNBD
addr, port, src_pid = *ARGV
server = FakeDest.new( addr, port )
client = server.accept
client.write_hello
write_req = client.read_request
data = client.read_data( write_req[:len] )
client.write_reply( write_req[:handle], 0 )
entrust_req = client.read_request
fail "Not an entrust" unless entrust_req[:type] == 65536
client.close
client2 = server.accept
client2.receive_mirror
exit(0)

View File

@@ -3,7 +3,8 @@
# Open a server, accept a client, then we expect a single write # Open a server, accept a client, then we expect a single write
# followed by an entrust. However, we disconnect after the write so # followed by an entrust. However, we disconnect after the write so
# the entrust will fail. We expect a reconnection. # the entrust will fail. We don't expect a reconnection: the sender
# can't reliably spot a failed send.
require 'flexnbd/fake_dest' require 'flexnbd/fake_dest'
include FlexNBD include FlexNBD
@@ -21,7 +22,4 @@ client.write_reply( req[:handle], 0 )
client.close client.close
Process.kill("CONT", src_pid.to_i) Process.kill("CONT", src_pid.to_i)
client2 = server.accept
client2.close
exit(0) exit(0)

View File

@@ -1,34 +0,0 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Receive a mirror, but respond to the entrust with an error. There's
# currently no code path in flexnbd which can do this, but we could
# add one.
require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
client = server.accept
client.write_hello
loop do
req = client.read_request
if req[:type] == 1
client.read_data( req[:len] )
client.write_reply( req[:handle] )
else
client.write_reply( req[:handle], 1 )
break
end
end
client.close
client2 = server.accept( "Timed out waiting for a reconnection" )
client2.close
server.close
exit(0)

View File

@@ -0,0 +1,19 @@
#!/usr/bin/env ruby
# Wait for a sender connection, send a correct hello, then sigterm the
# sender. We expect the sender to exit with status of 6, which is
# enforced in the test.
require 'flexnbd/fake_dest'
include FlexNBD
addr, port, pid = *ARGV
server = FakeDest.new( addr, port )
client = server.accept( "Timed out waiting for a connection" )
client.write_hello
Process.kill(15, pid.to_i)
client.close
server.close
exit 0

View File

@@ -3,8 +3,8 @@
# Connect, send a migration, entrust then *immediately* disconnect. # Connect, send a migration, entrust then *immediately* disconnect.
# This simulates a client which fails while the client is blocked. # This simulates a client which fails while the client is blocked.
# #
# We attempt to reconnect immediately afterwards to prove that we can # In this situation we expect the destination to quit with an error
# retry the mirroring. # status.
require 'flexnbd/fake_source' require 'flexnbd/fake_source'
include FlexNBD include FlexNBD
@@ -28,7 +28,11 @@ system "kill -CONT #{srv_pid}"
sleep(0.25) sleep(0.25)
client2 = FakeSource.new( addr, port, "Timed out reconnecting" ) begin
client2.close client2 = FakeSource.new( addr, port, "Expected timeout" )
fail "Unexpected reconnection"
rescue Timeout::Error
# expected
end
exit(0) exit(0)

View File

@@ -1,15 +1,14 @@
#!/usr/bin/env ruby #!/usr/bin/env ruby
# Connect, send a migration, entrust then *immediately* disconnect. # Connect, send a migration, entrust, read the reply, then disconnect.
# This simulates a client which fails while the client is blocked. # This simulates a client which fails while the client is blocked.
# #
# We attempt to reconnect immediately afterwards to prove that we can # We expect the destination to quit with an error status.
# retry the mirroring.
require 'flexnbd/fake_source' require 'flexnbd/fake_source'
include FlexNBD include FlexNBD
addr, port, srv_pid, rebind_addr, rebind_port = *ARGV addr, port, srv_pid = *ARGV
client = FakeSource.new( addr, port, "Timed out connecting" ) client = FakeSource.new( addr, port, "Timed out connecting" )
client.read_hello client.read_hello
@@ -22,11 +21,13 @@ client.close
sleep(0.25) sleep(0.25)
client2 = FakeSource.new( addr, port, "Timed out reconnecting to mirror" )
client2.send_mirror
sleep(1) begin
client3 = FakeSource.new( rebind_addr, rebind_port, "Timed out reconnecting to read" ) client2 = FakeSource.new( addr, port, "Expected timeout" )
client3.close fail "Unexpected reconnection"
rescue Timeout::Error
# expected
end
exit(0) exit(0)

View File

@@ -12,10 +12,11 @@ addr, port, srv_pid = *ARGV
client = FakeSource.new( addr, port, "Timed out connecting" ) client = FakeSource.new( addr, port, "Timed out connecting" )
client.read_hello client.read_hello
Process.kill( "STOP", srv_pid.to_i )
system "kill -STOP #{srv_pid}"
client.write_write_request( 0, 8 ) client.write_write_request( 0, 8 )
client.close client.close
Process.kill( "CONT", srv_pid.to_i ) system "kill -CONT #{srv_pid}"
# This sleep ensures that we don't return control to the test runner # This sleep ensures that we don't return control to the test runner
# too soon, giving the flexnbd process time to fall over if it's going # too soon, giving the flexnbd process time to fall over if it's going

View File

@@ -13,13 +13,13 @@ addr, port, srv_pid = *ARGV
client = FakeSource.new( addr, port, "Timed out connecting" ) client = FakeSource.new( addr, port, "Timed out connecting" )
client.read_hello client.read_hello
Process.kill( "STOP", srv_pid.to_i ) system "kill -STOP #{srv_pid}"
client.write_write_request( 0, 8 ) client.write_write_request( 0, 8 )
client.write_data( "12345678" ) client.write_data( "12345678" )
client.close client.close
Process.kill( "CONT", srv_pid.to_i ) system "kill -CONT #{srv_pid}"
# This sleep ensures that we don't return control to the test runner # This sleep ensures that we don't return control to the test runner
# too soon, giving the flexnbd process time to fall over if it's going # too soon, giving the flexnbd process time to fall over if it's going

View File

@@ -13,10 +13,8 @@ addr, port = *ARGV
client = FakeSource.new( addr, port, "Timed out connecting", "127.0.0.6" ) client = FakeSource.new( addr, port, "Timed out connecting", "127.0.0.6" )
sleep( 0.25 ) sleep( 0.25 )
client.ensure_disconnected
rsp = client.disconnected? ? 0 : 1
client.close client.close
exit(0) exit(rsp)

View File

@@ -0,0 +1,20 @@
#!/usr/bin/env ruby
# Connect to the listener, wait for the hello, then sigterm the
# listener. We expect the listener to exit with a status of 6, which
# is enforced in the test.
require 'flexnbd/fake_source'
include FlexNBD
addr, port, pid = *ARGV
client = FakeSource.new( addr, port, "Timed out connecting." )
client.read_hello
Process.kill( "TERM", pid.to_i )
sleep(0.2)
client.close
exit(0)

View File

@@ -1,29 +1,18 @@
#!/usr/bin/env ruby #!/usr/bin/env ruby
# Successfully send a migration, but squat on the IP and port which # Successfully send a migration. This test just makes sure that the
# the destination wants to rebind to. The destination should retry # happy path is covered. We expect the destination to quit with a
# every second, so we give it up then attempt to connect to the new # success status.
# server.
require 'flexnbd/fake_source' require 'flexnbd/fake_source'
include FlexNBD include FlexNBD
addr, port, srv_pid, newaddr, newport = *ARGV addr, port, srv_pid, newaddr, newport = *ARGV
squatter = TCPServer.open( newaddr, newport.to_i )
client = FakeSource.new( addr, port, "Timed out connecting" ) client = FakeSource.new( addr, port, "Timed out connecting" )
client.send_mirror() client.send_mirror()
sleep(1) sleep(1)
squatter.close()
sleep(1)
client2 = FakeSource.new( newaddr, newport.to_i, "Timed out reconnecting" )
client2.read_hello
client2.read( 0, 8 )
client2.close
exit( 0 ) exit( 0 )

View File

@@ -8,6 +8,10 @@ class FileWriter
@pattern = "" @pattern = ""
end end
def size
@blocksize * @pattern.split("").size
end
# We write in fixed block sizes, given by "blocksize" # We write in fixed block sizes, given by "blocksize"
# _ means skip a block # _ means skip a block
# 0 means write a block full of zeroes # 0 means write a block full of zeroes

View File

@@ -21,7 +21,7 @@ class ValgrindExecutor
attr_reader :pid attr_reader :pid
def run( cmd ) def run( cmd )
@pid = fork do exec "valgrind --track-origins=yes #{cmd}" end @pid = fork do exec "valgrind --track-origins=yes --suppressions=custom.supp #{cmd}" end
end end
end # class ValgrindExecutor end # class ValgrindExecutor
@@ -97,7 +97,9 @@ class ValgrindKillingExecutor
when "line" when "line"
@error.add_line( @text ) if @found @error.add_line( @text ) if @found
when "error", "stack" when "error", "stack"
if @found
@killer.call( @error ) @killer.call( @error )
end
when "pid" when "pid"
@error.pid=@text @error.pid=@text
end end
@@ -129,18 +131,18 @@ class ValgrindKillingExecutor
def run( cmd ) def run( cmd )
@io_r, io_w = IO.pipe @io_r, io_w = IO.pipe
@pid = fork do exec( "valgrind --xml=yes --xml-fd=#{io_w.fileno} " + cmd ) end @pid = fork do exec( "valgrind --suppressions=custom.supp --xml=yes --xml-fd=#{io_w.fileno} " + cmd ) end
launch_watch_thread( @pid, @io_r ) launch_watch_thread( @pid, @io_r )
@pid @pid
end end
def call( err ) def call( err )
Process.kill( "KILL", @pid )
$stderr.puts "*"*72 $stderr.puts "*"*72
$stderr.puts "* Valgrind error spotted:" $stderr.puts "* Valgrind error spotted:"
$stderr.puts err.to_s.split("\n").map{|s| " #{s}"} $stderr.puts err.to_s.split("\n").map{|s| " #{s}"}
$stderr.puts "*"*72 $stderr.puts "*"*72
Process.kill( "KILL", @pid )
exit(1) exit(1)
end end
@@ -163,10 +165,11 @@ class ValgrindKillingExecutor
end # class ValgrindExecutor end # class ValgrindExecutor
module FlexNBD
# Noddy test class to exercise FlexNBD from the outside for testing. # Noddy test class to exercise FlexNBD from the outside for testing.
# #
class FlexNBD class FlexNBD
attr_reader :bin, :ctrl, :pid, :ip, :port, :rebind_ip, :rebind_port attr_reader :bin, :ctrl, :pid, :ip, :port
class << self class << self
def counter def counter
@@ -195,7 +198,7 @@ class FlexNBD
end end
end end
def initialize(bin, ip, port, rebind_ip = ip, rebind_port = port) def initialize( bin, ip, port )
@bin = bin @bin = bin
@do_debug = ENV['DEBUG'] @do_debug = ENV['DEBUG']
@debug = build_debug_opt @debug = build_debug_opt
@@ -204,8 +207,6 @@ class FlexNBD
@ctrl = "/tmp/.flexnbd.ctrl.#{Time.now.to_i}.#{rand}" @ctrl = "/tmp/.flexnbd.ctrl.#{Time.now.to_i}.#{rand}"
@ip = ip @ip = ip
@port = port @port = port
@rebind_ip = rebind_ip
@rebind_port = rebind_port
@kill = [] @kill = []
end end
@@ -235,13 +236,20 @@ class FlexNBD
"--addr #{ip} "\ "--addr #{ip} "\
"--port #{port} "\ "--port #{port} "\
"--file #{file} "\ "--file #{file} "\
"--rebind-addr #{rebind_ip} " \
"--rebind-port #{rebind_port} " \
"--sock #{ctrl} "\ "--sock #{ctrl} "\
"#{@debug} "\ "#{@debug} "\
"#{acl.join(' ')}" "#{acl.join(' ')}"
end end
def proxy_cmd( connect_ip, connect_port )
"#{bin}-proxy "\
"--addr #{ip} "\
"--port #{port} "\
"--conn-addr #{connect_ip} "\
"--conn-port #{connect_port} "\
"#{@debug}"
end
def read_cmd( offset, length ) def read_cmd( offset, length )
"#{bin} read "\ "#{bin} read "\
@@ -263,14 +271,36 @@ class FlexNBD
end end
def mirror_cmd(dest_ip, dest_port) def base_mirror_opts( dest_ip, dest_port )
"#{@bin} mirror "\
"--addr #{dest_ip} "\ "--addr #{dest_ip} "\
"--port #{dest_port} "\ "--port #{dest_port} "\
"--sock #{ctrl} "\ "--sock #{ctrl} "\
end
def unlink_mirror_opts( dest_ip, dest_port )
"#{base_mirror_opts( dest_ip, dest_port )} "\
"--unlink "
end
def base_mirror_cmd( opts )
"#{@bin} mirror "\
"#{opts} "\
"#{@debug}" "#{@debug}"
end end
def mirror_cmd(dest_ip, dest_port)
base_mirror_cmd( base_mirror_opts( dest_ip, dest_port ) )
end
def mirror_unlink_cmd( dest_ip, dest_port )
base_mirror_cmd( unlink_mirror_opts( dest_ip, dest_port ) )
end
def break_cmd
"#{@bin} break "\
"--sock #{ctrl} "\
"#{@debug}"
end
def status_cmd def status_cmd
"#{@bin} status "\ "#{@bin} status "\
@@ -291,34 +321,69 @@ class FlexNBD
debug( cmd ) debug( cmd )
@pid = @executor.run( cmd ) @pid = @executor.run( cmd )
start_wait_thread( @pid )
while !File.socket?(ctrl) while !File.socket?(ctrl)
pid, status = Process.wait2(@pid, Process::WNOHANG) pid, status = Process.wait2(@pid, Process::WNOHANG)
raise "server did not start (#{cmd})" if pid raise "server did not start (#{cmd})" if pid
sleep 0.1 sleep 0.1
end end
start_wait_thread( @pid )
at_exit { kill } at_exit { kill }
end end
private :run_serve_cmd private :run_serve_cmd
def serve( file, *acl) def serve( file, *acl)
run_serve_cmd( serve_cmd( file, acl ) ) cmd = serve_cmd( file, acl )
run_serve_cmd( cmd )
sleep( 0.2 ) until File.exists?( ctrl )
end end
def listen(file, *acl) def listen(file, *acl)
run_serve_cmd( listen_cmd( file, acl ) ) run_serve_cmd( listen_cmd( file, acl ) )
end end
def tcp_server_open?
# raises if the other side doesn't accept()
sock = TCPSocket.new(ip, port) rescue nil
success = !!sock
( sock.close rescue nil) if sock
success
end
def proxy( connect_ip, connect_port )
cmd = proxy_cmd( connect_ip, connect_port )
debug( cmd )
@pid = @executor.run( cmd )
until tcp_server_open?
pid, status = Process.wait2(@pid, Process::WNOHANG)
raise "server did not start (#{cmd})" if pid
sleep 0.1
end
start_wait_thread( @pid )
at_exit { kill }
end
def start_wait_thread( pid ) def start_wait_thread( pid )
@wait_thread = Thread.start do @wait_thread = Thread.start do
_, status = Process.waitpid2( pid ) _, status = Process.waitpid2( pid )
if @kill if @kill
fail "flexnbd quit with a bad status: #{status.exitstatus}" unless if status.signaled?
fail "flexnbd quit with a bad signal: #{status.inspect}" unless
@kill.include? status.termsig
else
fail "flexnbd quit with a bad status: #{status.inspect}" unless
@kill.include? status.exitstatus @kill.include? status.exitstatus
end
else else
$stderr.puts "flexnbd #{self.pid} quit" $stderr.puts "flexnbd #{self.pid} quit"
fail "flexnbd #{self.pid} quit early with status #{status.to_i}" fail "flexnbd #{self.pid} quit early with status #{status.to_i}"
@@ -383,6 +448,14 @@ class FlexNBD
end end
def mirror_unlink( dest_ip, dest_port, timeout=nil )
cmd = mirror_unlink_cmd( dest_ip, dest_port )
debug( cmd )
maybe_timeout( cmd, timeout )
end
def maybe_timeout(cmd, timeout=nil ) def maybe_timeout(cmd, timeout=nil )
stdout, stderr = "","" stdout, stderr = "",""
run = Proc.new do run = Proc.new do
@@ -411,6 +484,15 @@ class FlexNBD
end end
def break(timeout=nil)
cmd = break_cmd
debug( cmd )
maybe_timeout( cmd, timeout )
end
def acl(*acl) def acl(*acl)
cmd = acl_cmd( *acl ) cmd = acl_cmd( *acl )
debug( cmd ) debug( cmd )
@@ -434,6 +516,14 @@ class FlexNBD
end end
def paused
Process.kill( "STOP", @pid )
yield
ensure
Process.kill( "CONT", @pid )
end
protected protected
def control_command(*args) def control_command(*args)
raise "Server not running" unless @pid raise "Server not running" unless @pid
@@ -465,3 +555,5 @@ class FlexNBD
end end
end

View File

@@ -32,6 +32,9 @@ module FlexNBD
end end
read_constants() read_constants()
REQUEST_MAGIC = "\x25\x60\x95\x13" unless defined?(REQUEST_MAGIC)
REPLY_MAGIC = "\x67\x44\x66\x98" unless defined?(REPLY_MAGIC)
end # module FlexNBD end # module FlexNBD

View File

@@ -56,8 +56,6 @@ module FlexNBD
} }
end end
REPLY_MAGIC="\x67\x44\x66\x98"
def write_error( handle ) def write_error( handle )
write_reply( handle, 1 ) write_reply( handle, 1 )
end end
@@ -76,7 +74,7 @@ module FlexNBD
if opts[:magic] == :wrong if opts[:magic] == :wrong
write_rand( @sock, 4 ) write_rand( @sock, 4 )
else else
@sock.write( REPLY_MAGIC ) @sock.write( ::FlexNBD::REPLY_MAGIC )
end end
@sock.write( [err].pack("N") ) @sock.write( [err].pack("N") )
@@ -93,6 +91,10 @@ module FlexNBD
@sock.read( len ) @sock.read( len )
end end
def write_data( len )
@sock.write( len )
end
def self.parse_be64(str) def self.parse_be64(str)
raise "String is the wrong length: 8 bytes expected (#{str.length} received)" unless raise "String is the wrong length: 8 bytes expected (#{str.length} received)" unless
@@ -161,3 +163,4 @@ module FlexNBD
end # module FakeDest end # module FakeDest
end # module FlexNBD end # module FlexNBD

View File

@@ -9,11 +9,17 @@ module FlexNBD
def initialize( addr, port, err_msg, source_addr=nil, source_port=0 ) def initialize( addr, port, err_msg, source_addr=nil, source_port=0 )
timing_out( 2, err_msg ) do timing_out( 2, err_msg ) do
begin
@sock = if source_addr @sock = if source_addr
TCPSocket.new( addr, port, source_addr, source_port ) TCPSocket.new( addr, port, source_addr, source_port )
else else
TCPSocket.new( addr, port ) TCPSocket.new( addr, port )
end end
rescue Errno::ECONNREFUSED
$stderr.puts "Connection refused, retrying"
sleep(0.2)
retry
end
end end
end end
@@ -24,7 +30,7 @@ module FlexNBD
def read_hello() def read_hello()
timing_out( FlexNBD::MS_HELLO_TIME_SECS, timing_out( ::FlexNBD::MS_HELLO_TIME_SECS,
"Timed out waiting for hello." ) do "Timed out waiting for hello." ) do
fail "No hello." unless (hello = @sock.read( 152 )) && fail "No hello." unless (hello = @sock.read( 152 )) &&
hello.length==152 hello.length==152
@@ -41,14 +47,13 @@ module FlexNBD
end end
def send_request( type, handle="myhandle", from=0, len=0 ) def send_request( type, handle="myhandle", from=0, len=0, magic=REQUEST_MAGIC )
fail "Bad handle" unless handle.length == 8 fail "Bad handle" unless handle.length == 8
@sock.write( "\x25\x60\x95\x13" ) @sock.write( magic )
@sock.write( [type].pack( 'N' ) ) @sock.write( [type].pack( 'N' ) )
@sock.write( handle ) @sock.write( handle )
@sock.write( "\x0"*4 ) @sock.write( [n64( from )].pack( 'q' ) )
@sock.write( [from].pack( 'N' ) )
@sock.write( [len].pack( 'N' ) ) @sock.write( [len].pack( 'N' ) )
end end
@@ -90,16 +95,19 @@ module FlexNBD
def send_mirror def send_mirror
read_hello() read_hello()
write_write_request( 0, 8 ) write( 0, "12345678" )
write_data( "12345678" )
read_response()
write_entrust_request()
read_response() read_response()
write_disconnect_request() write_disconnect_request()
close() close()
end end
def write( from, data )
write_write_request( from, data.length )
write_data( data )
end
def read_response def read_response
magic = @sock.read(4) magic = @sock.read(4)
error_s = @sock.read(4) error_s = @sock.read(4)
@@ -113,10 +121,10 @@ module FlexNBD
end end
def ensure_disconnected def disconnected?
Timeout.timeout( 2 ) do result = nil
@sock.read(1) Timeout.timeout( 2 ) { result = ( @sock.read(1) == nil ) }
end result
end end
@@ -131,6 +139,22 @@ module FlexNBD
end end
end end
private
# take a 64-bit number, turn it upside down (due to :
# http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/11920
# )
def n64(b)
((b & 0xff00000000000000) >> 56) |
((b & 0x00ff000000000000) >> 40) |
((b & 0x0000ff0000000000) >> 24) |
((b & 0x000000ff00000000) >> 8) |
((b & 0x00000000ff000000) << 8) |
((b & 0x0000000000ff0000) << 24) |
((b & 0x000000000000ff00) << 40) |
((b & 0x00000000000000ff) << 56)
end
end # class FakeSource end # class FakeSource
end # module FlexNBD end # module FlexNBD

View File

@@ -21,12 +21,18 @@ class TestDestErrorHandling < Test::Unit::TestCase
assert_no_control assert_no_control
end end
=begin
# This is disabled while CLIENT_MAX_WAIT_SECS is removed
def test_hello_goes_astray_causes_timeout_error def test_hello_goes_astray_causes_timeout_error
run_fake( "source/hang_after_hello" ) run_fake( "source/hang_after_hello" )
assert_no_control assert_no_control
end end
=end
def test_sigterm_has_bad_exit_status
@env.nbd1.can_die(1)
run_fake( "source/sigterm_after_hello" )
end
def test_disconnect_after_hello_causes_error_not_fatal def test_disconnect_after_hello_causes_error_not_fatal
run_fake( "source/close_after_hello" ) run_fake( "source/close_after_hello" )
@@ -58,10 +64,6 @@ class TestDestErrorHandling < Test::Unit::TestCase
run_fake( "source/close_after_write" ) run_fake( "source/close_after_write" )
end end
def test_disconnect_before_entrust_reply_causes_error
run_fake( "source/close_after_entrust" )
end
def test_disconnect_before_write_reply_causes_error def test_disconnect_before_write_reply_causes_error
# Note that this is an odd case: writing the reply doesn't fail. # Note that this is an odd case: writing the reply doesn't fail.
@@ -71,23 +73,16 @@ class TestDestErrorHandling < Test::Unit::TestCase
end end
def test_disconnect_after_entrust_reply_causes_error
def test_straight_migration
@env.nbd1.can_die(0) @env.nbd1.can_die(0)
# This fake runs a failed migration then a succeeding one, so we
# expect the destination to take control.
run_fake( "source/close_after_entrust_reply" )
assert_control
end
def test_cant_rebind_retries
run_fake( "source/successful_transfer" ) run_fake( "source/successful_transfer" )
end end
private private
def run_fake( name ) def run_fake( name )
@env.run_fake( name, @env.ip, @env.port1, @env.ip, @env.rebind_port1 ) @env.run_fake( name, @env.ip, @env.port1 )
assert @env.fake_reports_success, "#{name} failed." assert @env.fake_reports_success, "#{name} failed."
end end
@@ -105,3 +100,4 @@ class TestDestErrorHandling < Test::Unit::TestCase
end end
end # class TestDestErrorHandling end # class TestDestErrorHandling

View File

@@ -64,24 +64,51 @@ class TestHappyPath < Test::Unit::TestCase
end end
def test_mirror def setup_to_mirror
@env.writefile1( "f"*4 ) @env.writefile1( "f"*4 )
@env.serve1 @env.serve1
@env.writefile2( "0"*4 ) @env.writefile2( "0"*4 )
@env.listen2 @env.listen2
end
def test_mirror
@env.nbd1.can_die @env.nbd1.can_die
@env.nbd2.can_die(0)
setup_to_mirror()
stdout, stderr = @env.mirror12 stdout, stderr = @env.mirror12
@env.nbd1.join @env.nbd1.join
@env.nbd2.join
assert( File.file?( @env.filename1 ),
"The source file was incorrectly deleted")
assert_equal(@env.file1.read_original( 0, @env.blocksize ), assert_equal(@env.file1.read_original( 0, @env.blocksize ),
@env.file2.read( 0, @env.blocksize ) ) @env.file2.read( 0, @env.blocksize ) )
assert @env.status2['has_control'], "destination didn't take control"
end end
def test_mirror_unlink
@env.nbd1.can_die(0)
@env.nbd2.can_die(0)
setup_to_mirror()
assert File.file?( @env.filename1 )
stdout, stderr = @env.mirror12_unlink
assert_no_match( /unrecognized/, stderr )
Timeout.timeout(2) do @env.nbd1.join end
assert !File.file?( @env.filename1 )
end
def test_write_to_high_block def test_write_to_high_block
# Create a large file, then try to write to somewhere after the 2G boundary # Create a large file, then try to write to somewhere after the 2G boundary
@env.truncate1 "4G" @env.truncate1 "4G"
@@ -92,4 +119,41 @@ class TestHappyPath < Test::Unit::TestCase
assert_equal "12345678", @env.nbd1.read( 2**31+2**29, 8 ) assert_equal "12345678", @env.nbd1.read( 2**31+2**29, 8 )
end end
def test_set_acl
# Just check that we get sane feedback here
@env.writefile1( "f"*4 )
@env.serve1
_,stderr = @env.acl1("127.0.0.1")
assert_no_match( /^(F|E):/, stderr )
end end
def test_write_more_than_one_run
one_mb = 2**20
data = "\0" * 256 * one_mb
File.open(@env.filename1, "wb") do |f| f.write( "1" * 256 * one_mb ) end
@env.serve1
sleep 5
@env.write1( data )
@env.nbd1.can_die(0)
@env.nbd1.kill
i = 0
File.open(@env.filename1, "rb") do |f|
while mb = f.read( one_mb )
unless "\0"*one_mb == mb
msg = "Read non-zeros after offset %x:\n"%(i * one_mb)
msg += `hexdump #{@env.filename1} | head -n5`
fail msg
end
i += 1
end
end
end
end

View File

@@ -0,0 +1,200 @@
require 'test/unit'
require 'environment'
require 'flexnbd/fake_source'
require 'flexnbd/fake_dest'
class TestProxyMode < Test::Unit::TestCase
def setup
super
@env = Environment.new
@env.writefile1( "0" * 16 )
end
def teardown
@env.cleanup
super
end
def with_proxied_client( override_size = nil )
@env.serve1 unless @server_up
@env.proxy2 unless @proxy_up
@env.nbd2.can_die(0)
client = FlexNBD::FakeSource.new(@env.ip, @env.port2, "Couldn't connect to proxy")
begin
result = client.read_hello
assert_equal "NBDMAGIC", result[:magic]
assert_equal override_size || @env.file1.size, result[:size]
yield client
ensure
client.close rescue nil
end
end
def test_exits_with_error_when_cannot_connect_to_upstream_on_start
assert_raises(RuntimeError) { @env.proxy1 }
end
def test_read_requests_successfully_proxied
with_proxied_client do |client|
(0..3).each do |n|
offset = n * 4096
client.write_read_request(offset, 4096, "myhandle")
rsp = client.read_response
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert_equal 0, rsp[:error]
orig_data = @env.file1.read(offset, 4096)
data = client.read_raw(4096)
assert_equal 4096, orig_data.size
assert_equal 4096, data.size
assert_equal( orig_data, data, "Returned data does not match" )
end
end
end
def test_write_requests_successfully_proxied
with_proxied_client do |client|
(0..3).each do |n|
offset = n * 4096
client.write(offset, "\xFF" * 4096)
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert_equal 0, rsp[:error]
data = @env.file1.read(offset, 4096)
assert_equal( ( "\xFF" * 4096 ), data, "Data not written correctly (offset is #{n})" )
end
end
end
def make_fake_server
server = FlexNBD::FakeDest.new(@env.ip, @env.port1)
@server_up = true
# We return a thread here because accept() and connect() both block for us
Thread.new do
sc = server.accept # just tell the supervisor we're up
sc.write_hello
[ server, sc ]
end
end
def test_read_request_retried_when_upstream_dies_partway
maker = make_fake_server
with_proxied_client(4096) do |client|
server, sc1 = maker.value
# Send the read request to the proxy
client.write_read_request( 0, 4096 )
# ensure we're given the read request
req1 = sc1.read_request
assert_equal ::FlexNBD::REQUEST_MAGIC, req1[:magic]
assert_equal ::FlexNBD::REQUEST_READ, req1[:type]
assert_equal 0, req1[:from]
assert_not_equal 0, req1[:len]
# Kill the server again, now we're sure the read request has been sent once
sc1.close
# We expect the proxy to reconnect without our client doing anything.
sc2 = server.accept
sc2.write_hello
# And once reconnected, it should resend an identical request.
req2 = sc2.read_request
assert_equal req1, req2
# The reply should be proxied back to the client.
sc2.write_reply( req2[:handle] )
sc2.write_data( "\xFF" * 4096 )
# Check it to make sure it's correct
rsp = timeout(15) { client.read_response }
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
assert_equal req1[:handle], rsp[:handle]
data = client.read_raw( 4096 )
assert_equal( ("\xFF" * 4096), data, "Wrong data returned" )
sc2.close
server.close
end
end
def test_write_request_retried_when_upstream_dies_partway
maker = make_fake_server
with_proxied_client(4096) do |client|
server, sc1 = maker.value
# Send the read request to the proxy
client.write( 0, ( "\xFF" * 4096 ) )
# ensure we're given the read request
req1 = sc1.read_request
assert_equal ::FlexNBD::REQUEST_MAGIC, req1[:magic]
assert_equal ::FlexNBD::REQUEST_WRITE, req1[:type]
assert_equal 0, req1[:from]
assert_equal 4096, req1[:len]
data1 = sc1.read_data( 4096 )
assert_equal( ( "\xFF" * 4096 ), data1, "Data not proxied successfully" )
# Kill the server again, now we're sure the read request has been sent once
sc1.close
# We expect the proxy to reconnect without our client doing anything.
sc2 = server.accept
sc2.write_hello
# And once reconnected, it should resend an identical request.
req2 = sc2.read_request
assert_equal req1, req2
data2 = sc2.read_data( 4096 )
assert_equal data1, data2
# The reply should be proxied back to the client.
sc2.write_reply( req2[:handle] )
# Check it to make sure it's correct
rsp = timeout(15) { client.read_response }
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
assert_equal req1[:handle], rsp[:handle]
sc2.close
server.close
end
end
def test_only_one_client_can_connect_to_proxy_at_a_time
with_proxied_client do |client|
c2 = nil
assert_raises(Timeout::Error) do
timeout(1) do
c2 = FlexNBD::FakeSource.new(@env.ip, @env.port2, "Couldn't connect to proxy (2)")
c2.read_hello
end
end
c2.close rescue nil if c2
end
end
end

View File

@@ -0,0 +1,109 @@
require 'test/unit'
require 'environment'
require 'flexnbd/fake_source'
class TestServeMode < Test::Unit::TestCase
def setup
super
@env = Environment.new
@env.writefile1( "0" )
@env.serve1
end
def teardown
@env.cleanup
super
end
def connect_to_server
client = FlexNBD::FakeSource.new(@env.ip, @env.port1, "Connecting to server failed")
begin
result = client.read_hello
assert_equal "NBDMAGIC", result[:magic]
assert_equal @env.file1.size, result[:size]
yield client
ensure
client.close rescue nil
end
end
def test_bad_request_magic_receives_error_response
connect_to_server do |client|
# replace REQUEST_MAGIC with all 0s to make it look bad
client.send_request( 0, "myhandle", 0, 0, "\x00\x00\x00\x00" )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert rsp[:error] != 0, "Server sent success reply back: #{rsp[:error]}"
# The client should be disconnected now
assert client.disconnected?, "Server not disconnected"
end
end
def test_long_write_on_top_of_short_write_is_respected
connect_to_server do |client|
# Start with a file of all-zeroes.
client.write( 0, "\x00" * @env.file1.size )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
client.write( 0, "\xFF" )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
client.write( 0, "\xFF\xFF" )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
end
assert_equal "\xFF\xFF", @env.file1.read( 0, 2 )
end
def test_read_request_out_of_bounds_receives_error_response
connect_to_server do |client|
client.write_read_request( @env.file1.size, 4096 )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert rsp[:error] != 0, "Server sent success reply back: #{rsp[:error]}"
# Ensure we're not disconnected by sending a request. We don't care about
# whether the reply is good or not, here.
client.write_read_request( 0, 4096 )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
end
end
def test_write_request_out_of_bounds_receives_error_response
connect_to_server do |client|
client.write( @env.file1.size, "\x00" * 4096 )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert rsp[:error] != 0, "Server sent success reply back: #{rsp[:error]}"
# Ensure we're not disconnected by sending a request. We don't care about
# whether the reply is good or not, here.
client.write( 0, "\x00" * @env.file1.size )
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
end
end
end

View File

@@ -19,12 +19,24 @@ class TestSourceErrorHandling < Test::Unit::TestCase
end end
def expect_term_during_migration
@env.nbd1.can_die(1,9)
end
def test_failure_to_connect_reported_in_mirror_cmd_response def test_failure_to_connect_reported_in_mirror_cmd_response
stdout, stderr = @env.mirror12_unchecked stdout, stderr = @env.mirror12_unchecked
expect_term_during_migration
assert_match( /failed to connect/, stderr ) assert_match( /failed to connect/, stderr )
end end
def test_sigterm_after_hello_quits_with_status_of_1
expect_term_during_migration
run_fake( "dest/sigterm_after_hello" )
end
def test_destination_hangs_after_connect_reports_error_at_source def test_destination_hangs_after_connect_reports_error_at_source
run_fake( "dest/hang_after_connect", run_fake( "dest/hang_after_connect",
:err => /Remote server failed to respond/ ) :err => /Remote server failed to respond/ )
@@ -36,6 +48,7 @@ class TestSourceErrorHandling < Test::Unit::TestCase
:err => /Mirror was rejected/ ) :err => /Mirror was rejected/ )
end end
def test_wrong_size_causes_disconnect def test_wrong_size_causes_disconnect
run_fake( "dest/hello_wrong_size", run_fake( "dest/hello_wrong_size",
:err => /Remote size does not match local size/ ) :err => /Remote size does not match local size/ )
@@ -43,57 +56,57 @@ class TestSourceErrorHandling < Test::Unit::TestCase
def test_wrong_magic_causes_disconnect def test_wrong_magic_causes_disconnect
expect_term_during_migration
run_fake( "dest/hello_wrong_magic", run_fake( "dest/hello_wrong_magic",
:err => /Mirror was rejected/ ) :err => /Mirror was rejected/ )
end end
def test_disconnect_after_hello_causes_retry def test_disconnect_after_hello_causes_retry
expect_term_during_migration
run_fake( "dest/close_after_hello", run_fake( "dest/close_after_hello",
:out => /Mirror started/ ) :out => /Mirror started/ )
end end
def test_write_times_out_causes_retry def test_write_times_out_causes_retry
expect_term_during_migration
run_fake( "dest/hang_after_write" ) run_fake( "dest/hang_after_write" )
end end
def test_rejected_write_causes_retry def test_rejected_write_causes_retry
expect_term_during_migration
run_fake( "dest/error_on_write" ) run_fake( "dest/error_on_write" )
end end
def test_disconnect_before_write_reply_causes_retry def test_disconnect_before_write_reply_causes_retry
expect_term_during_migration
run_fake( "dest/close_after_write" ) run_fake( "dest/close_after_write" )
end end
def test_bad_write_reply_causes_retry def test_bad_write_reply_causes_retry
expect_term_during_migration
run_fake( "dest/write_wrong_magic" ) run_fake( "dest/write_wrong_magic" )
end end
def test_pre_entrust_disconnect_causes_retry def test_pre_entrust_disconnect_causes_retry
expect_term_during_migration
run_fake( "dest/close_after_writes" ) run_fake( "dest/close_after_writes" )
end end
def test_post_entrust_disconnect_causes_retry def test_cancel_migration
@env.nbd1.can_die(0) run_fake( "dest/break_after_hello" )
run_fake( "dest/close_after_entrust" )
end end
def test_entrust_error_causes_retry
run_fake( "dest/error_on_entrust" )
end
private private
def run_fake(name, opts = {}) def run_fake(name, opts = {})
@env.run_fake( name, @env.ip, @env.port2 ) @env.run_fake( name, @env.ip, @env.port2, @env.nbd1.ctrl )
stdout, stderr = @env.mirror12_unchecked stdout, stderr = @env.mirror12_unchecked
assert_success assert_success
assert_match( opts[:err], stderr ) if opts[:err] assert_match( opts[:err], stderr ) if opts[:err]

View File

@@ -0,0 +1,171 @@
#!/usr/bin/env ruby
require 'test/unit'
require 'flexnbd/fake_source'
require 'socket'
require 'fileutils'
require 'tmpdir'
Thread.abort_on_exception = true
class TestWriteDuringMigration < Test::Unit::TestCase
def setup
@flexnbd = File.expand_path("../../build/flexnbd")
raise "No binary!" unless File.executable?( @flexnbd )
@size = 20*1024*1024 # 20MB
@write_data = "foo!" * 2048 # 8K write
@source_port = 9990
@dest_port = 9991
@source_sock = "src.sock"
@dest_sock = "dst.sock"
@source_file = "src.file"
@dest_file = "dst.file"
end
def teardown
[@dst_proc, @src_proc].each do |pid|
if pid
Process.kill( "KILL", pid ) rescue nil
end
end
end
def debug_arg
ENV['DEBUG'] ? "--verbose" : ""
end
def launch_servers
@dst_proc = fork() {
cmd = "#{@flexnbd} listen -l 127.0.0.1 -p #{@dest_port} -f #{@dest_file} -s #{@dest_sock} #{debug_arg}"
exec cmd
}
@src_proc = fork() {
cmd = "#{@flexnbd} serve -l 127.0.0.1 -p #{@source_port} -f #{@source_file} -s #{@source_sock} #{debug_arg}"
exec cmd
}
begin
awaiting = nil
Timeout.timeout(10) do
awaiting = :source
sleep 0.1 while !File.exists?( @source_sock )
awaiting = :dest
sleep 0.1 while !File.exists?( @dest_sock )
end
rescue Timeout::Error
case awaiting
when :source
fail "Couldn't get a source socket."
when :dest
fail "Couldn't get a destination socket."
else
fail "Something went wrong I don't understand."
end
end
end
def make_files()
FileUtils.touch(@source_file)
File.truncate(@source_file, @size)
FileUtils.touch(@dest_file)
File.truncate(@dest_file, @size)
File.open(@source_file, "wb"){|f| f.write "a"*@size }
end
def start_mirror
UNIXSocket.open(@source_sock) {|sock|
sock.write(["mirror", "127.0.0.1", @dest_port.to_s, "exit"].join("\x0A") + "\x0A\x0A")
sock.flush
rsp = sock.readline
}
end
def wait_for_quit()
Timeout.timeout( 10 ) do
start_time = Time.now
dst_result = Process::waitpid2(@dst_proc)
src_result = Process::waitpid2(@src_proc)
end
end
def source_writer
client = FlexNBD::FakeSource.new( "127.0.0.1", @source_port, "Timed out connecting" )
offsets = Range.new(0, (@size - @write_data.size) / 4096 ).to_a
loop do
begin
client.write(offsets[rand(offsets.size)] * 4096, @write_data)
rescue => err
# We expect a broken write at some point, so ignore it
break
end
end
end
def assert_both_sides_identical
# puts `md5sum #{@source_file} #{@dest_file}`
# Ensure each block matches
File.open(@source_file, "r") do |source|
File.open(@dest_file, "r") do |dest|
0.upto( @size / 4096 ) do |block_num|
s_data = source.read( 4096 )
d_data = dest.read( 4096 )
assert s_data == d_data, "Block #{block_num} mismatch!"
source.seek( 4096, IO::SEEK_CUR )
dest.seek( 4096, IO::SEEK_CUR )
end
end
end
end
def test_write_during_migration
Dir.mktmpdir() do |tmpdir|
Dir.chdir( tmpdir ) do
make_files()
launch_servers()
src_writer = Thread.new { source_writer }
start_mirror()
wait_for_quit()
src_writer.join
assert_both_sides_identical
end
end
end
def test_many_clients_during_migration
Dir.mktmpdir() do |tmpdir|
Dir.chdir( tmpdir ) do
make_files()
launch_servers()
src_writers_1 = (1..5).collect { Thread.new { source_writer } }
start_mirror()
src_writers_2 = (1..5).collect { Thread.new { source_writer } }
wait_for_quit()
( src_writers_1 + src_writers_2 ).each {|t| t.join }
assert_both_sides_identical
end
end end
end

View File

@@ -29,7 +29,7 @@ end
@local = File.open(testname_local, "r+") @local = File.open(testname_local, "r+")
@serve = FlexNBD.new(binary, "127.0.0.1", 41234) @serve = FlexNBD::FlexNBD.new(binary, "127.0.0.1", 41234)
@serve.serve(testname_serve) @serve.serve(testname_serve)
$record = [] $record = []

View File

@@ -2,6 +2,11 @@
#include "bitset.h" #include "bitset.h"
#define assert_bitset_is( map, val ) {\
uint64_t *num = (uint64_t*) map->bits; \
ck_assert_int_eq( val, *num ); \
}
START_TEST(test_bit_set) START_TEST(test_bit_set)
{ {
uint64_t num = 0; uint64_t num = 0;
@@ -95,7 +100,7 @@ START_TEST(test_bit_runs)
ptr = 0; ptr = 0;
for (i=0; i < 20; i += 1) { for (i=0; i < 20; i += 1) {
int run = bit_run_count(buffer, ptr, 2048-ptr); int run = bit_run_count(buffer, ptr, 2048-ptr, NULL);
fail_unless( fail_unless(
run == runs[i], run == runs[i],
"run %d should have been %d, was %d", "run %d should have been %d, was %d",
@@ -108,7 +113,7 @@ END_TEST
START_TEST(test_bitset) START_TEST(test_bitset)
{ {
struct bitset_mapping* map; struct bitset * map;
uint64_t *num; uint64_t *num;
map = bitset_alloc(6400, 100); map = bitset_alloc(6400, 100);
@@ -143,23 +148,40 @@ END_TEST
START_TEST( test_bitset_set ) START_TEST( test_bitset_set )
{ {
struct bitset_mapping* map; struct bitset * map;
uint64_t *num; uint64_t run;
map = bitset_alloc(64, 1); map = bitset_alloc(64, 1);
num = (uint64_t*) map->bits;
ck_assert_int_eq( 0x0000000000000000, *num ); assert_bitset_is( map, 0x0000000000000000 );
bitset_set( map ); bitset_set( map );
ck_assert_int_eq( 0xffffffffffffffff, *num ); assert_bitset_is( map, 0xffffffffffffffff );
bitset_free( map );
map = bitset_alloc( 6400, 100 );
assert_bitset_is( map, 0x0000000000000000 );
bitset_set( map );
assert_bitset_is( map, 0xffffffffffffffff );
bitset_free( map );
// Now do something large and representative
map = bitset_alloc( 53687091200, 4096 );
bitset_set( map );
run = bitset_run_count( map, 0, 53687091200 );
ck_assert_int_eq( run, 53687091200 );
bitset_free( map );
} }
END_TEST END_TEST
START_TEST( test_bitset_clear ) START_TEST( test_bitset_clear )
{ {
struct bitset_mapping* map; struct bitset * map;
uint64_t *num; uint64_t *num;
uint64_t run;
map = bitset_alloc(64, 1); map = bitset_alloc(64, 1);
num = (uint64_t*) map->bits; num = (uint64_t*) map->bits;
@@ -168,26 +190,300 @@ START_TEST( test_bitset_clear )
bitset_set( map ); bitset_set( map );
bitset_clear( map ); bitset_clear( map );
ck_assert_int_eq( 0x0000000000000000, *num ); ck_assert_int_eq( 0x0000000000000000, *num );
bitset_free( map );
// Now do something large and representative
map = bitset_alloc( 53687091200, 4096 );
bitset_set( map );
bitset_clear( map );
run = bitset_run_count( map, 0, 53687091200 );
ck_assert_int_eq( run, 53687091200 );
bitset_free( map );
} }
END_TEST END_TEST
START_TEST( test_bitset_set_range )
{
struct bitset* map = bitset_alloc( 64, 1 );
assert_bitset_is( map, 0x0000000000000000 );
bitset_set_range( map, 8, 8 );
assert_bitset_is( map, 0x000000000000ff00 );
bitset_clear( map );
assert_bitset_is( map, 0x0000000000000000 );
bitset_set_range( map, 0, 0 );
assert_bitset_is( map, 0x0000000000000000 );
bitset_free( map );
}
END_TEST
START_TEST( test_bitset_clear_range )
{
struct bitset* map = bitset_alloc( 64, 1 );
bitset_set( map );
assert_bitset_is( map, 0xffffffffffffffff );
bitset_clear_range( map, 8, 8 );
assert_bitset_is( map, 0xffffffffffff00ff );
bitset_set( map );
assert_bitset_is( map, 0xffffffffffffffff );
bitset_clear_range( map, 0, 0 );
assert_bitset_is( map, 0xffffffffffffffff );
bitset_free( map );
}
END_TEST
START_TEST( test_bitset_run_count )
{
struct bitset* map = bitset_alloc( 64, 1 );
uint64_t run;
assert_bitset_is( map, 0x0000000000000000 );
run = bitset_run_count( map, 0, 64 );
ck_assert_int_eq( 64, run );
bitset_set_range( map, 0, 32 );
assert_bitset_is( map, 0x00000000ffffffff );
run = bitset_run_count( map, 0, 64 );
ck_assert_int_eq( 32, run );
run = bitset_run_count( map, 0, 16 );
ck_assert_int_eq( 16, run );
run = bitset_run_count( map, 16, 64 );
ck_assert_int_eq( 16, run );
run = bitset_run_count( map, 31, 64 );
ck_assert_int_eq( 1, run );
run = bitset_run_count( map, 32, 64 );
ck_assert_int_eq( 32, run );
run = bitset_run_count( map, 32, 32 );
ck_assert_int_eq( 32, run );
run = bitset_run_count( map, 32, 16 );
ck_assert_int_eq( 16, run );
bitset_free( map );
map = bitset_alloc( 6400, 100 );
assert_bitset_is( map, 0x0000000000000000 );
run = bitset_run_count( map, 0, 6400 );
ck_assert_int_eq( 6400, run );
bitset_set_range( map, 0, 3200 );
run = bitset_run_count( map, 0, 6400 );
ck_assert_int_eq( 3200, run );
run = bitset_run_count( map, 1, 6400 );
ck_assert_int_eq( 3199, run );
run = bitset_run_count( map, 3200, 6400 );
ck_assert_int_eq( 3200, run );
run = bitset_run_count( map, 6500, 6400 );
ck_assert_int_eq( 0, run );
bitset_free( map );
// Now do something large and representative
map = bitset_alloc( 53687091200, 4096 );
bitset_set( map );
run = bitset_run_count( map, 0, 53687091200 );
ck_assert_int_eq( run, 53687091200 );
bitset_free( map );
}
END_TEST
START_TEST( test_bitset_set_range_doesnt_push_to_stream )
{
struct bitset *map = bitset_alloc( 64, 1 );
bitset_set_range( map, 0, 64 );
ck_assert_int_eq( 0, bitset_stream_size( map ) );
bitset_free( map );
}
END_TEST
START_TEST( test_bitset_clear_range_doesnt_push_to_stream )
{
struct bitset *map = bitset_alloc( 64, 1 );
bitset_clear_range( map, 0, 64 );
ck_assert_int_eq( 0, bitset_stream_size( map ) );
bitset_free( map );
}
END_TEST
START_TEST(test_bitset_enable_stream)
{
struct bitset *map = bitset_alloc( 64, 1 );
struct bitset_stream_entry result;
memset( &result, 0, sizeof( result ) );
bitset_enable_stream( map );
ck_assert_int_eq( 1, map->stream_enabled );
bitset_stream_dequeue( map, &result );
ck_assert_int_eq( BITSET_STREAM_ON, result.event );
ck_assert_int_eq( 0, result.from );
ck_assert_int_eq( 64, result.len );
bitset_free( map );
}
END_TEST
START_TEST(test_bitset_disable_stream)
{
struct bitset *map = bitset_alloc( 64, 1 );
struct bitset_stream_entry result;
memset( &result, 0, sizeof( result ) );
bitset_enable_stream( map );
bitset_disable_stream( map );
ck_assert_int_eq( 0, map->stream_enabled );
ck_assert_int_eq( 2, bitset_stream_size( map ) );
bitset_stream_dequeue( map, NULL ); // ON
bitset_stream_dequeue( map, &result ); // OFF
ck_assert_int_eq( BITSET_STREAM_OFF, result.event );
ck_assert_int_eq( 0, result.from );
ck_assert_int_eq( 64, result.len );
bitset_free( map );
}
END_TEST
START_TEST(test_bitset_stream_with_set_range)
{
struct bitset *map = bitset_alloc( 64, 1 );
struct bitset_stream_entry result;
memset( &result, 0, sizeof( result ) );
bitset_enable_stream( map );
bitset_set_range( map, 0, 32 );
ck_assert_int_eq( 2, bitset_stream_size( map ) );
bitset_stream_dequeue( map, NULL ); // ON
bitset_stream_dequeue( map, &result ); // SET
ck_assert_int_eq( BITSET_STREAM_SET, result.event );
ck_assert_int_eq( 0, result.from );
ck_assert_int_eq( 32, result.len );
bitset_free( map );
}
END_TEST
START_TEST(test_bitset_stream_with_clear_range)
{
struct bitset *map = bitset_alloc( 64, 1 );
struct bitset_stream_entry result;
memset( &result, 0, sizeof( result ) );
bitset_enable_stream( map );
bitset_clear_range( map, 0, 32 );
ck_assert_int_eq( 2, bitset_stream_size( map ) );
bitset_stream_dequeue( map, NULL ); // ON
bitset_stream_dequeue( map, &result ); // UNSET
ck_assert_int_eq( BITSET_STREAM_UNSET, result.event );
ck_assert_int_eq( 0, result.from );
ck_assert_int_eq( 32, result.len );
bitset_free( map );
}
END_TEST
START_TEST(test_bitset_stream_size)
{
struct bitset *map = bitset_alloc( 64, 1 );
bitset_enable_stream( map );
bitset_set_range( map, 0, 32 );
bitset_set_range( map, 16, 32 );
bitset_set_range( map, 7, 16 );
bitset_clear_range( map, 0, 32 );
bitset_clear_range( map, 16, 32 );
bitset_clear_range( map, 48, 16 );
bitset_disable_stream( map );
ck_assert_int_eq( 8, bitset_stream_size( map ) );
bitset_free( map );
}
END_TEST
START_TEST(test_bitset_stream_queued_bytes)
{
struct bitset *map = bitset_alloc( 64, 1 );
bitset_enable_stream( map );
bitset_set_range( map, 0, 32 );
bitset_set_range( map, 16, 32 );
bitset_set_range( map, 7, 16 );
bitset_clear_range( map, 0, 32 );
bitset_clear_range( map, 16, 32 );
bitset_clear_range( map, 48, 16 );
bitset_clear_range( map, 0, 2 );
bitset_disable_stream( map );
ck_assert_int_eq( 64, bitset_stream_queued_bytes( map, BITSET_STREAM_ON ) );
ck_assert_int_eq( 80, bitset_stream_queued_bytes( map, BITSET_STREAM_SET ) );
ck_assert_int_eq( 82, bitset_stream_queued_bytes( map, BITSET_STREAM_UNSET ) );
ck_assert_int_eq( 64, bitset_stream_queued_bytes( map, BITSET_STREAM_OFF ) );
bitset_free( map );
}
END_TEST
Suite* bitset_suite(void) Suite* bitset_suite(void)
{ {
Suite *s = suite_create("bitset"); Suite *s = suite_create("bitset");
TCase *tc_bit = tcase_create("bit"); TCase *tc_bit = tcase_create("bit");
TCase *tc_bitset = tcase_create("bitset");
tcase_add_test(tc_bit, test_bit_set); tcase_add_test(tc_bit, test_bit_set);
tcase_add_test(tc_bit, test_bit_clear); tcase_add_test(tc_bit, test_bit_clear);
tcase_add_test(tc_bit, test_bit_tests); tcase_add_test(tc_bit, test_bit_tests);
tcase_add_test(tc_bit, test_bit_ranges); tcase_add_test(tc_bit, test_bit_ranges);
tcase_add_test(tc_bit, test_bit_runs); tcase_add_test(tc_bit, test_bit_runs);
suite_add_tcase(s, tc_bit);
TCase *tc_bitset = tcase_create("bitset");
tcase_add_test(tc_bitset, test_bitset); tcase_add_test(tc_bitset, test_bitset);
tcase_add_test(tc_bitset, test_bitset_set); tcase_add_test(tc_bitset, test_bitset_set);
tcase_add_test(tc_bitset, test_bitset_clear); tcase_add_test(tc_bitset, test_bitset_clear);
suite_add_tcase(s, tc_bit); tcase_add_test(tc_bitset, test_bitset_run_count);
tcase_add_test(tc_bitset, test_bitset_set_range);
tcase_add_test(tc_bitset, test_bitset_clear_range);
tcase_add_test(tc_bitset, test_bitset_set_range_doesnt_push_to_stream);
tcase_add_test(tc_bitset, test_bitset_clear_range_doesnt_push_to_stream);
suite_add_tcase(s, tc_bitset); suite_add_tcase(s, tc_bitset);
TCase *tc_bitset_stream = tcase_create("bitset_stream");
tcase_add_test(tc_bitset_stream, test_bitset_enable_stream);
tcase_add_test(tc_bitset_stream, test_bitset_disable_stream);
tcase_add_test(tc_bitset_stream, test_bitset_stream_with_set_range);
tcase_add_test(tc_bitset_stream, test_bitset_stream_with_clear_range);
tcase_add_test(tc_bitset_stream, test_bitset_stream_size);
tcase_add_test(tc_bitset_stream, test_bitset_stream_queued_bytes);
suite_add_tcase(s, tc_bitset_stream);
return s; return s;
} }

View File

@@ -7,15 +7,12 @@ START_TEST( test_listening_assigns_sock )
{ {
struct flexnbd * flexnbd = flexnbd_create_listening( struct flexnbd * flexnbd = flexnbd_create_listening(
"127.0.0.1", "127.0.0.1",
NULL,
"4777", "4777",
NULL,
"fakefile", "fakefile",
"fakesock", "fakesock",
0, 0,
0, 0,
NULL, NULL );
1 );
fail_if( NULL == flexnbd->control->socket_name, "No socket was copied" ); fail_if( NULL == flexnbd->control->socket_name, "No socket was copied" );
} }
END_TEST END_TEST

View File

@@ -1,57 +0,0 @@
#include "serve.h"
#include "listen.h"
#include "util.h"
#include "flexnbd.h"
#include <check.h>
#include <string.h>
START_TEST( test_defaults_main_serve_opts )
{
struct flexnbd flexnbd;
struct listen * listen = listen_create( &flexnbd, "127.0.0.1", NULL, "4777", NULL,
"foo", 0, 0, NULL, 1 );
NULLCHECK( listen );
struct server *init_serve = listen->init_serve;
struct server *main_serve = listen->main_serve;
NULLCHECK( init_serve );
NULLCHECK( main_serve );
fail_unless( 0 == memcmp(&init_serve->bind_to,
&main_serve->bind_to,
sizeof( union mysockaddr )),
"Main serve bind_to was not set" );
}
END_TEST
Suite* listen_suite(void)
{
Suite *s = suite_create("listen");
TCase *tc_create = tcase_create("create");
tcase_add_exit_test(tc_create, test_defaults_main_serve_opts, 0);
suite_add_tcase(s, tc_create);
return s;
}
#ifdef DEBUG
# define LOG_LEVEL 0
#else
# define LOG_LEVEL 2
#endif
int main(void)
{
log_level = LOG_LEVEL;
int number_failed;
Suite *s = listen_suite();
SRunner *sr = srunner_create(s);
srunner_run_all(sr, CK_NORMAL);
number_failed = srunner_ntests_failed(sr);
srunner_free(sr);
return (number_failed == 0) ? 0 : 1;
}

View File

@@ -9,6 +9,9 @@
#include <errno.h> #include <errno.h>
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/socket.h> #include <sys/socket.h>
#include <sys/un.h> #include <sys/un.h>
@@ -56,6 +59,7 @@ void * responder( void *respond_uncast )
else { else {
fd_write_reply( sock_fd, resp->received.handle, 0 ); fd_write_reply( sock_fd, resp->received.handle, 0 );
} }
write( sock_fd, "12345678", 8 );
} }
return NULL; return NULL;
} }
@@ -85,14 +89,16 @@ void respond_destroy( struct respond * respond ){
} }
void * entruster( void * nothing __attribute__((unused))) void * reader( void * nothing __attribute__((unused)))
{ {
DECLARE_ERROR_CONTEXT( error_context ); DECLARE_ERROR_CONTEXT( error_context );
error_set_handler( (cleanup_handler *)error_marker, error_context ); error_set_handler( (cleanup_handler *)error_marker, error_context );
struct respond * respond = respond_create( 1 ); struct respond * respond = respond_create( 1 );
int devnull = open("/dev/null", O_WRONLY);
char outbuf[8] = {0};
socket_nbd_entrust( respond->sock_fds[0] ); socket_nbd_read( respond->sock_fds[0], 0, 8, devnull, outbuf, 1 );
return NULL; return NULL;
} }
@@ -101,13 +107,14 @@ START_TEST( test_rejects_mismatched_handle )
{ {
error_init(); error_init();
pthread_t entruster_thread; pthread_t reader_thread;
log_level=5; log_level=5;
marker = 0; marker = 0;
pthread_create( &entruster_thread, NULL, entruster, NULL ); pthread_create( &reader_thread, NULL, reader, NULL );
FATAL_UNLESS( 0 == pthread_join( entruster_thread, NULL ), "pthread_join failed"); FATAL_UNLESS( 0 == pthread_join( reader_thread, NULL ),
"pthread_join failed");
log_level=2; log_level=2;
@@ -120,19 +127,10 @@ START_TEST( test_accepts_matched_handle )
{ {
struct respond * respond = respond_create( 0 ); struct respond * respond = respond_create( 0 );
socket_nbd_entrust( respond->sock_fds[0] ); int devnull = open("/dev/null", O_WRONLY);
char outbuf[8] = {0};
respond_destroy( respond ); socket_nbd_read( respond->sock_fds[0], 0, 8, devnull, outbuf, 1 );
}
END_TEST
START_TEST( test_entrust_type_sent )
{
struct respond * respond = respond_create( 0 );
socket_nbd_entrust( respond->sock_fds[0] );
fail_unless( respond->received.type == REQUEST_ENTRUST, "Wrong type sent." );
respond_destroy( respond ); respond_destroy( respond );
} }
@@ -159,7 +157,6 @@ Suite* readwrite_suite(void)
tcase_add_test(tc_transfer, test_rejects_mismatched_handle); tcase_add_test(tc_transfer, test_rejects_mismatched_handle);
tcase_add_exit_test(tc_transfer, test_accepts_matched_handle, 0); tcase_add_exit_test(tc_transfer, test_accepts_matched_handle, 0);
tcase_add_test( tc_transfer, test_entrust_type_sent );
/* This test is a little funny. We respond with a dodgy handle /* This test is a little funny. We respond with a dodgy handle
* and check that this *doesn't* cause a message rejection, * and check that this *doesn't* cause a message rejection,

View File

@@ -64,7 +64,7 @@ START_TEST( test_replaces_acl )
{ {
struct flexnbd flexnbd; struct flexnbd flexnbd;
flexnbd.signal_fd = -1; flexnbd.signal_fd = -1;
struct server * s = server_create( &flexnbd, "127.0.0.1", "0", dummy_file, 0, 0, NULL, 1, 1 ); struct server * s = server_create( &flexnbd, "127.0.0.1", "0", dummy_file, 0, 0, NULL, 1, 0, 1 );
struct acl * new_acl = acl_create( 0, NULL, 0 ); struct acl * new_acl = acl_create( 0, NULL, 0 );
server_replace_acl( s, new_acl ); server_replace_acl( s, new_acl );
@@ -79,7 +79,7 @@ START_TEST( test_signals_acl_updated )
{ {
struct flexnbd flexnbd; struct flexnbd flexnbd;
flexnbd.signal_fd = -1; flexnbd.signal_fd = -1;
struct server * s = server_create( &flexnbd, "127.0.0.1", "0", dummy_file, 0, 0, NULL, 1, 1 ); struct server * s = server_create( &flexnbd, "127.0.0.1", "0", dummy_file, 0, 0, NULL, 1, 0, 1 );
struct acl * new_acl = acl_create( 0, NULL, 0 ); struct acl * new_acl = acl_create( 0, NULL, 0 );
server_replace_acl( s, new_acl ); server_replace_acl( s, new_acl );
@@ -148,7 +148,7 @@ START_TEST( test_acl_update_closes_bad_client )
*/ */
struct flexnbd flexnbd; struct flexnbd flexnbd;
flexnbd.signal_fd = -1; flexnbd.signal_fd = -1;
struct server * s = server_create( &flexnbd, "127.0.0.7", "0", dummy_file, 0, 0, NULL, 1, 1 ); struct server * s = server_create( &flexnbd, "127.0.0.7", "0", dummy_file, 0, 0, NULL, 1, 0, 1 );
struct acl * new_acl = acl_create( 0, NULL, 1 ); struct acl * new_acl = acl_create( 0, NULL, 1 );
struct client * c; struct client * c;
struct client_tbl_entry * entry; struct client_tbl_entry * entry;
@@ -193,7 +193,7 @@ START_TEST( test_acl_update_leaves_good_client )
struct flexnbd flexnbd; struct flexnbd flexnbd;
flexnbd.signal_fd = -1; flexnbd.signal_fd = -1;
struct server * s = server_create( &flexnbd, "127.0.0.7", "0", dummy_file, 0, 0, NULL, 1, 1 ); struct server * s = server_create( &flexnbd, "127.0.0.7", "0", dummy_file, 0, 0, NULL, 1, 0, 1 );
char *lines[] = {"127.0.0.1"}; char *lines[] = {"127.0.0.1"};
struct acl * new_acl = acl_create( 1, lines, 1 ); struct acl * new_acl = acl_create( 1, lines, 1 );

113
tests/unit/check_sockutil.c Normal file
View File

@@ -0,0 +1,113 @@
#include <sys/types.h>
#include <arpa/inet.h>
#include <netinet/tcp.h>
#include "sockutil.h"
#include <check.h>
START_TEST( test_sockaddr_address_string_af_inet_converts_to_string )
{
struct sockaddr sa;
struct sockaddr_in* v4 = (struct sockaddr_in*) &sa;
char testbuf[128];
const char* result;
v4->sin_family = AF_INET;
v4->sin_port = htons( 4777 );
ck_assert_int_eq( 1, inet_pton( AF_INET, "192.168.0.1", &v4->sin_addr ));
result = sockaddr_address_string( &sa, &testbuf[0], 128 );
ck_assert( result != NULL );
ck_assert_str_eq( "192.168.0.1 port 4777", testbuf );
}
END_TEST
START_TEST( test_sockaddr_address_string_af_inet6_converts_to_string )
{
struct sockaddr_in6 v6_raw;
struct sockaddr_in6* v6 = &v6_raw;
struct sockaddr* sa = (struct sockaddr*) &v6_raw;
char testbuf[128];
const char* result;
v6->sin6_family = AF_INET6;
v6->sin6_port = htons( 4777 );
ck_assert_int_eq( 1, inet_pton( AF_INET6, "fe80::1", &v6->sin6_addr ));
result = sockaddr_address_string( sa, &testbuf[0], 128 );
ck_assert( result != NULL );
ck_assert_str_eq( "fe80::1 port 4777", testbuf );
}
END_TEST
/* We don't know what it is, so we just call it "???" and return NULL */
START_TEST( test_sockaddr_address_string_af_unspec_is_failure )
{
struct sockaddr sa;
struct sockaddr_in* v4 = (struct sockaddr_in*) &sa;
char testbuf[128];
const char* result;
v4->sin_family = AF_UNSPEC;
v4->sin_port = htons( 4777 );
ck_assert_int_eq( 1, inet_pton( AF_INET, "192.168.0.1", &v4->sin_addr ));
result = sockaddr_address_string( &sa, &testbuf[0], 128 );
ck_assert( result == NULL );
ck_assert_str_eq( "???", testbuf );
}
END_TEST
/* This is a complete failure to parse, rather than a partial failure */
START_TEST( test_sockaddr_address_string_doesnt_overflow_short_buffer )
{
struct sockaddr sa;
struct sockaddr_in* v4 = (struct sockaddr_in*) &sa;
char testbuf[128];
const char* result;
v4->sin_family = AF_INET;
v4->sin_port = htons( 4777 );
ck_assert_int_eq( 1, inet_pton( AF_INET, "192.168.0.1", &v4->sin_addr ));
result = sockaddr_address_string( &sa, &testbuf[0], 2 );
ck_assert( result == NULL );
ck_assert_str_eq( "??", testbuf );
}
END_TEST
Suite *sockutil_suite(void)
{
Suite *s = suite_create("sockutil");
TCase *tc_sockaddr_address_string = tcase_create("sockaddr_address_string");
tcase_add_test(tc_sockaddr_address_string, test_sockaddr_address_string_af_inet_converts_to_string);
tcase_add_test(tc_sockaddr_address_string, test_sockaddr_address_string_af_inet6_converts_to_string);
tcase_add_test(tc_sockaddr_address_string, test_sockaddr_address_string_af_unspec_is_failure);
tcase_add_test(tc_sockaddr_address_string, test_sockaddr_address_string_doesnt_overflow_short_buffer);
suite_add_tcase(s, tc_sockaddr_address_string);
return s;
}
int main(void)
{
int number_failed;
Suite *s = sockutil_suite();
SRunner *sr = srunner_create(s);
srunner_run_all(sr, CK_NORMAL);
number_failed = srunner_ntests_failed(sr);
srunner_free(sr);
return (number_failed == 0) ? 0 : 1;
}

View File

@@ -2,105 +2,343 @@
#include "serve.h" #include "serve.h"
#include "ioutil.h" #include "ioutil.h"
#include "util.h" #include "util.h"
#include "bitset.h"
#include <check.h> #include <check.h>
struct server* mock_server(void)
{
struct server* out = xmalloc( sizeof( struct server ) );
out->l_start_mirror = flexthread_mutex_create();
out->nbd_client = xmalloc( sizeof( struct client_tbl_entry ) * 4 );
out->max_nbd_clients = 4;
out->size = 65536;
out->allocation_map = bitset_alloc( 65536, 4096 );
return out;
}
struct server* mock_mirroring_server(void)
{
struct server *out = mock_server();
out->mirror = xmalloc( sizeof( struct mirror ) );
out->mirror_super = xmalloc( sizeof( struct mirror_super ) );
return out;
}
void destroy_mock_server( struct server* serve )
{
if ( NULL != serve->mirror ) {
free( serve->mirror );
}
if ( NULL != serve->mirror_super ) {
free( serve->mirror_super );
}
flexthread_mutex_destroy( serve->l_start_mirror );
bitset_free( serve->allocation_map );
free( serve->nbd_client );
free( serve );
}
START_TEST( test_status_create ) START_TEST( test_status_create )
{ {
struct server server; struct server * server = mock_server();
struct status *status = NULL; struct status * status = status_create( server );
status = status_create( &server );
fail_if( NULL == status, "Status wasn't allocated" ); fail_if( NULL == status, "Status wasn't allocated" );
status_destroy( status ); status_destroy( status );
destroy_mock_server( server );
} }
END_TEST END_TEST
START_TEST( test_gets_has_control ) START_TEST( test_gets_has_control )
{ {
struct server server; struct server * server = mock_server();
struct status * status; server->success = 1;
server.has_control = 1; struct status * status = status_create( server );
status = status_create( &server );
fail_unless( status->has_control == 1, "has_control wasn't copied" ); fail_unless( status->has_control == 1, "has_control wasn't copied" );
status_destroy( status ); status_destroy( status );
destroy_mock_server( server );
} }
END_TEST END_TEST
START_TEST( test_gets_is_mirroring ) START_TEST( test_gets_is_mirroring )
{ {
struct server server; struct server * server = mock_server();
struct status * status; struct status * status = status_create( server );
server.mirror = NULL;
status = status_create( &server );
fail_if( status->is_mirroring, "is_mirroring was set" ); fail_if( status->is_mirroring, "is_mirroring was set" );
status_destroy( status ); status_destroy( status );
destroy_mock_server( server );
server = mock_mirroring_server();
status = status_create( server );
server.mirror = (struct mirror *)xmalloc( sizeof( struct mirror ) );
status = status_create( &server );
fail_unless( status->is_mirroring, "is_mirroring wasn't set" ); fail_unless( status->is_mirroring, "is_mirroring wasn't set" );
status_destroy( status ); status_destroy( status );
destroy_mock_server( server );
}
END_TEST
START_TEST( test_gets_clients_allowed )
{
struct server * server = mock_server();
struct status * status = status_create( server );
fail_if( status->clients_allowed, "clients_allowed was set" );
status_destroy( status );
server->allow_new_clients = 1;
status = status_create( server );
fail_unless( status->clients_allowed, "clients_allowed was not set" );
status_destroy( status );
destroy_mock_server( server );
}
END_TEST
START_TEST( test_gets_num_clients )
{
struct server * server = mock_server();
struct status * status = status_create( server );
fail_if( status->num_clients != 0, "num_clients was wrong" );
status_destroy( status );
server->nbd_client[0].thread = 1;
server->nbd_client[1].thread = 1;
status = status_create( server );
fail_unless( status->num_clients == 2, "num_clients was wrong" );
status_destroy( status );
destroy_mock_server( server );
}
END_TEST
START_TEST( test_gets_pid )
{
struct server * server = mock_server();
struct status * status = status_create( server );
fail_unless( getpid() == status->pid, "Pid wasn't gathered" );
status_destroy( status );
destroy_mock_server( server );
}
END_TEST
START_TEST( test_gets_size )
{
struct server * server = mock_server();
server->size = 1024;
struct status * status = status_create( server );
fail_unless( 1024 == status->size, "Size wasn't gathered" );
status_destroy( status );
destroy_mock_server( server );
}
END_TEST
START_TEST( test_gets_migration_statistics )
{
struct server * server = mock_mirroring_server();
server->mirror->all_dirty = 16384;
server->mirror->max_bytes_per_second = 32768;
server->mirror->offset = 0;
/* we have a bit of a time dependency here */
server->mirror->migration_started = monotonic_time_ms();
struct status * status = status_create( server );
fail_unless (
0 == status->migration_duration ||
1 == status->migration_duration ||
2 == status->migration_duration,
"migration_duration is unreasonable!"
);
fail_unless(
16384 / ( status->migration_duration + 1 ) == status->migration_speed,
"migration_speed not calculated correctly"
);
fail_unless( 32768 == status->migration_speed_limit, "migration_speed_limit not read" );
// ( size / current_bps ) + 1 happens to be 3 for this test
fail_unless( 3 == status->migration_seconds_left, "migration_seconds_left not gathered" );
status_destroy( status );
destroy_mock_server( server );
} }
END_TEST END_TEST
#define RENDER_TEST_SETUP \
struct status status; \
int fds[2]; \
pipe( fds );
void fail_unless_rendered( int fd, char *fragment )
{
char buf[1024] = {0};
char emsg[1024] = {0};
char *found = NULL;
sprintf(emsg, "Fragment: %s not found", fragment );
fail_unless( read_until_newline( fd, buf, 1024 ) > 0, "Couldn't read" );
found = strstr( buf, fragment );
fail_if( NULL == found, emsg );
return;
}
void fail_if_rendered( int fd, char *fragment )
{
char buf[1024] = {0};
char emsg[1024] = {0};
char *found = NULL;
sprintf(emsg, "Fragment: %s found", fragment );
fail_unless( read_until_newline( fd, buf, 1024 ) > 0, "Couldn't read" );
found = strstr( buf, fragment );
fail_unless( NULL == found, emsg );
return;
}
START_TEST( test_renders_has_control ) START_TEST( test_renders_has_control )
{ {
struct status status; RENDER_TEST_SETUP
int fds[2];
pipe(fds);
char buf[1024] = {0};
status.has_control = 1; status.has_control = 1;
status_write( &status, fds[1] ); status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "has_control=true" );
fail_unless( read_until_newline( fds[0], buf, 1024 ) > 0,
"Couldn't read the result" );
char *found = strstr( buf, "has_control=true" );
fail_if( NULL == found, "has_control=true not found" );
status.has_control = 0; status.has_control = 0;
status_write( &status, fds[1] ); status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "has_control=false" );
fail_unless( read_until_newline( fds[0], buf, 1024 ) > 0,
"Couldn't read the result" );
found = strstr( buf, "has_control=false" );
fail_if( NULL == found, "has_control=false not found" );
} }
END_TEST END_TEST
START_TEST( test_renders_is_mirroring ) START_TEST( test_renders_is_mirroring )
{ {
struct status status; RENDER_TEST_SETUP
int fds[2];
pipe(fds);
char buf[1024] = {0};
status.is_mirroring = 1; status.is_mirroring = 1;
status_write( &status, fds[1] ); status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "is_mirroring=true" );
fail_unless( read_until_newline( fds[0], buf, 1024 ) > 0,
"Couldn't read the result" );
char *found = strstr( buf, "is_mirroring=true" );
fail_if( NULL == found, "is_mirroring=true not found" );
status.is_mirroring = 0; status.is_mirroring = 0;
status_write( &status, fds[1] ); status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "is_mirroring=false" );
}
END_TEST
fail_unless( read_until_newline( fds[0], buf, 1024 ) > 0, START_TEST( test_renders_clients_allowed )
"Couldn't read the result" ); {
found = strstr( buf, "is_mirroring=false" ); RENDER_TEST_SETUP
fail_if( NULL == found, "is_mirroring=false not found" );
status.clients_allowed = 1;
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "clients_allowed=true" );
status.clients_allowed = 0;
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "clients_allowed=false" );
}
END_TEST
START_TEST( test_renders_num_clients )
{
RENDER_TEST_SETUP
status.num_clients = 0;
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "num_clients=0" );
status.num_clients = 4000;
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "num_clients=4000" );
}
END_TEST
START_TEST( test_renders_pid )
{
RENDER_TEST_SETUP
status.pid = 42;
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "pid=42" );
}
END_TEST
START_TEST( test_renders_size )
{
RENDER_TEST_SETUP
status.size = ( (uint64_t)1 << 33 );
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "size=8589934592" );
}
END_TEST
START_TEST( test_renders_migration_statistics )
{
RENDER_TEST_SETUP
status.is_mirroring = 0;
status.migration_duration = 8;
status.migration_speed = 40000000;
status.migration_speed_limit = 40000001;
status.migration_seconds_left = 1;
status_write( &status, fds[1] );
fail_if_rendered( fds[0], "migration_duration" );
status_write( &status, fds[1] );
fail_if_rendered( fds[0], "migration_speed" );
status_write( &status, fds[1] );
fail_if_rendered( fds[0], "migration_speed_limit" );
status_write( &status, fds[1] );
fail_if_rendered( fds[0], "migration_seconds_left" );
status.is_mirroring = 1;
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "migration_duration=8" );
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "migration_speed=40000000" );
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "migration_speed_limit=40000001" );
status_write( &status, fds[1] );
fail_unless_rendered( fds[0], "migration_seconds_left=1" );
status.migration_speed_limit = UINT64_MAX;
status_write( &status, fds[1] );
fail_if_rendered( fds[0], "migration_speed_limit" );
} }
END_TEST END_TEST
@@ -114,9 +352,20 @@ Suite *status_suite(void)
tcase_add_test(tc_create, test_status_create); tcase_add_test(tc_create, test_status_create);
tcase_add_test(tc_create, test_gets_has_control); tcase_add_test(tc_create, test_gets_has_control);
tcase_add_test(tc_create, test_gets_is_mirroring); tcase_add_test(tc_create, test_gets_is_mirroring);
tcase_add_test(tc_create, test_gets_clients_allowed);
tcase_add_test(tc_create, test_gets_num_clients);
tcase_add_test(tc_create, test_gets_pid);
tcase_add_test(tc_create, test_gets_size);
tcase_add_test(tc_create, test_gets_migration_statistics);
tcase_add_test(tc_render, test_renders_has_control); tcase_add_test(tc_render, test_renders_has_control);
tcase_add_test(tc_render, test_renders_is_mirroring); tcase_add_test(tc_render, test_renders_is_mirroring);
tcase_add_test(tc_render, test_renders_clients_allowed);
tcase_add_test(tc_render, test_renders_num_clients);
tcase_add_test(tc_render, test_renders_pid);
tcase_add_test(tc_render, test_renders_size);
tcase_add_test(tc_render, test_renders_migration_statistics);
suite_add_tcase(s, tc_create); suite_add_tcase(s, tc_create);
suite_add_tcase(s, tc_render); suite_add_tcase(s, tc_render);
@@ -136,4 +385,3 @@ int main(void)
return (number_failed == 0) ? 0 : 1; return (number_failed == 0) ? 0 : 1;
} }