261 Commits

Author SHA1 Message Date
Patrick J Cherry
102738d9ad Updated logging output during readloop() and writeloop() failures
There's a handy SHOW_ERRNO macro we can use to get consistent logging
for system call failures from readloop() and writeloop().
2018-04-27 10:45:42 +01:00
James Carter
3e0d30f6b9 Merge branch 'reinstate-sync-after-every-write' into 'develop'
Reinstate sync after every write

See merge request open-source/flexnbd-c!51
2018-04-24 12:02:53 +01:00
Patrick J Cherry
3b1a150315 Updated changelgo 2018-04-24 10:27:46 +01:00
Patrick J Cherry
ead6328d80 Force sync after every write 2018-04-24 10:27:02 +01:00
James Carter
20b4f069c8 Merge branch 'release-to-master' into 'develop'
Merge back to develop

See merge request open-source/flexnbd-c!49
2018-02-20 11:52:37 +00:00
Patrick J Cherry
331ca4be14 Updated changelog for release 2018-02-20 11:45:42 +00:00
James Carter
fb5714765c Merge branch 'fix-formatting' into 'develop'
Formatted all code using `indent`

See merge request open-source/flexnbd-c!47
2018-02-20 11:42:25 +00:00
Patrick J Cherry
af3bb16ff7 Merge branch 'develop' into fix-formatting 2018-02-20 11:06:58 +00:00
Patrick J Cherry
9cbcc7c95a Added note about the test file formatting 2018-02-20 11:05:36 +00:00
Patrick J Cherry
8893cd06c4 Re-formatted tests with a bit of tinkering by hand 2018-02-20 11:02:33 +00:00
James Carter
166db9b1f7 Merge branch 'enable-flags-test' into 'develop'
Enable request flags test

See merge request open-source/flexnbd-c!48
2018-02-20 10:23:42 +00:00
Patrick J Cherry
103bd7ad5b Undo formatting on test suite -- it wasn't right 2018-02-20 10:13:42 +00:00
Patrick J Cherry
7bee1aadfe Enable request flags test
Missed this out when I wrote the test!
2018-02-20 10:11:38 +00:00
Patrick J Cherry
f47f56d4c4 Formatted all code using indent 2018-02-20 10:05:35 +00:00
James Carter
19a1127bde Merge branch 'fix-correct-num-clients-status' into 'develop'
Call the thread cleanup code when requesting `status`

See merge request open-source/flexnbd-c!46
2018-02-20 09:51:37 +00:00
James Carter
073a4ac0fa Merge branch '35-incorrect-struct-type-used-in-readwrite-c' into 'develop'
Resolve "Incorrect struct type used in readwrite.c"

Closes #35

See merge request open-source/flexnbd-c!41
2018-02-20 09:50:25 +00:00
Patrick J Cherry
623007bfff Remove last reference to removed test_gets_num_clients 2018-02-19 10:22:01 +00:00
Patrick J Cherry
27a94a807e Remove the test_gets_num_clients test from the C unit tests
This test was causing problems by using dummy pointers to simulate
connections.  When calling the cleanup code, these pointers were
thought to be real, and the code attemtped to clean up threads
referenced by those pointers, causing a segfault.

I've reimplemented the test in the ruby acceptance suite.
2018-02-16 13:46:31 +00:00
Patrick J Cherry
1407407ff4 Updated changelog 2018-02-16 13:00:31 +00:00
Patrick J Cherry
d0439dab88 Call the thread cleanup code when requesting status
This ensures the correct number of connected clients is returned when
the status command is issued.

Previously the thread pool would only be cleaned up on a new connection.
2018-02-16 12:58:03 +00:00
James F. Carter
9f56f38f42 Merge branch 'rationalise-ld-preload-tests' into develop 2018-02-14 16:48:57 +00:00
Chris Elsworth
370d04d971 Merge branch 'take-request-response-size-into-malloc' into 'develop'
Update proxy malloc to add the struct size onto the request/response buffer

See merge request open-source/flexnbd-c!45
2018-02-14 05:28:24 +00:00
Patrick J Cherry
099e29de91 Merge branch 'develop' into 'take-request-response-size-into-malloc'
# Conflicts:
#   debian/changelog
2018-02-13 17:06:41 +00:00
Patrick J Cherry
2e17e8955f Added tests for NBD_MAX_SIZE
This constant is only used in the proxy, so the tests only cover proxy
mode.
2018-02-13 17:04:51 +00:00
Patrick J Cherry
bb1f6ecdf5 Updated changelog 2018-02-13 15:51:09 +00:00
Patrick J Cherry
158379ba7a Use correct constant name. 2018-02-12 19:11:24 +00:00
Patrick J Cherry
1c66b56af1 Update proxy malloc to add the struct size onto the request/response buffer
This alters the meaning of NBD_MAX_SIZE to be the actual max request size
we'll accept over nbd.  Previously it was *nearly* the max size we'd
accept depending on the size of the struct.
2018-02-12 19:04:29 +00:00
Ian Chilton
03d9eb01b5 Merge branch 'increase-log-level-for-readloop-failures' into 'develop'
Increase log level for readloop failures, which might help with diagnosis

See merge request open-source/flexnbd-c!44
2018-02-09 15:38:48 +00:00
Patrick J Cherry
cdcd527544 Refactored read_reply to compare the network-byte-ordered handle 2018-02-09 12:18:34 +00:00
Patrick J Cherry
169d40f575 Increase log level for readloop failures, which might help with diagnosis 2018-02-09 11:57:07 +00:00
Patrick J Cherry
21f384e343 Updated changelog 2018-02-09 11:44:28 +00:00
Patrick J Cherry
9817fd7b0a Final tidies, comments etc. 2018-02-09 11:42:25 +00:00
Patrick J Cherry
195de41d86 Remove extra line 2018-02-09 11:32:26 +00:00
Patrick J Cherry
5b350e10e5 Merge branch 'develop' into '35-incorrect-struct-type-used-in-readwrite-c'
# Conflicts:
#   debian/changelog
2018-02-09 11:29:48 +00:00
Patrick J Cherry
b75a6529d0 Move LdPreload include to correct place 2018-02-09 10:41:24 +00:00
Patrick J Cherry
8e67180999 Check that TCP_NODELAY is set on upstream sockets on reconnection
Also rationalize the test to see if a function has been called.  Still
not great, but getting there :)
2018-02-09 10:26:08 +00:00
Patrick J Cherry
c053a54faa Added test to cover setsockopt for tcpkeepalive 2018-02-08 23:07:17 +00:00
Patrick J Cherry
ebacf738bc Tidy up ld preload hacks 2018-02-08 22:28:34 +00:00
James Carter
c4bab3f81f Merge branch 'truncate-odd-sized-discs' into 'develop'
Discs must be sized in multiples of 512 bytes or odd things happen

See merge request open-source/flexnbd-c!42
2018-02-08 16:49:36 +00:00
Patrick J Cherry
a19267b377 Adjust block-rounding line to match in serve.c 2018-02-08 16:37:36 +00:00
Patrick J Cherry
23d9ff587e Updated changelog 2018-02-08 16:36:20 +00:00
Patrick J Cherry
347b7978e4 Discs must be sized in multiples of 512 bytes or odd things happen
In #36 some of the odd errors were due to seeks beyond the end of the
disc.  This was because the disc was "specially crafted" to be 25GB + 1
byte, which doesn't fit into the normal 512 byte sectors expected of a
disc.  This lead to reads going beyond the end of the disc etc.

If a similarly evil disc is used with `losetup`, it just ignores the
last bytes of the disc that don't fit into 512 chunks.  This is what
that patch does, logging an error at the same time.
2018-02-08 16:31:28 +00:00
Patrick J Cherry
f8fec5f57e Alter struct types to reflect reality, avoiding mixing "host" and "raw" structs 2018-02-08 15:46:34 +00:00
James Carter
1672b4b88b Merge branch '36-breaks-when-trying-to-install-debian-from-cd' into 'develop'
Resolve "breaks when trying to install debian from CD"

Closes #36

See merge request open-source/flexnbd-c!40
2018-02-08 13:59:12 +00:00
Patrick J Cherry
5e9dbbd626 Updated changelgo 2018-02-08 13:32:10 +00:00
Patrick J Cherry
8beb3f0af6 Allow proxy to pass NBD protocol errors downstream; server returns EINVAL/ENOSPC appropriately
Previously the proxy would just disconnect when it saw an NBD protocol
error, and retry the operation it was in the middle of.

Additionally, the server needs to return the correct error types when
this happens.
2018-02-08 13:19:51 +00:00
James Carter
806de13024 Merge branch 'try-flags' into 'develop'
Set flags to show we can accept FUA and FLUSH commands

See merge request open-source/flexnbd-c!38
2018-02-08 11:18:31 +00:00
Patrick J Cherry
f71b872622 Only set up LD_PRELOAD for tests that actually need it. 2018-02-07 22:05:07 +00:00
Patrick J Cherry
79181b3153 Added LD_PRELOAD library to monitor msync calls in testing 2018-02-07 21:45:20 +00:00
Patrick J Cherry
55548cc969 Change ordering of @env configuration/start so we can alter the blocksize.
argh.
2018-02-06 10:24:54 +00:00
Patrick J Cherry
9bf3b52d54 Call proxy_finish_connect_to_upstream when reconnecting, setting
TCP_NODELAY
2018-02-06 10:02:16 +00:00
Patrick J Cherry
da35187af0 Allow blocksize to be changed in Environment
This number is peppered all over the test suite, so changing @blocksize
for everything is not a goer, when we really only need to change it for
one test.
2018-02-06 09:55:32 +00:00
Patrick J Cherry
7704f9e5c8 Fix tests to reflect new filesize. 2018-02-06 07:57:40 +00:00
Patrick J Cherry
3a86870c9f Use sysconf to determine actual page size for msync
Also added comments in tests around testing for msync offsets/lengths.
2018-02-06 07:32:58 +00:00
Patrick J Cherry
6d6948af09 Fix offset calculation for partial msyncs to go to nearest 4k block
Previously they were always set to zero.
2018-02-05 23:05:00 +00:00
Patrick J Cherry
c423900f02 Fix typo 2018-02-05 17:04:23 +00:00
Patrick J Cherry
afa1bb0efb Use msync rather than fsync to flush the entire disc
This involves storing the size of the mapped disc in the client struct,
and then supplying that to the msync command.
2018-02-05 17:01:32 +00:00
Patrick J Cherry
ad2014ac9d Fixed long-standing bug with h2r functions being back to front
h2r seemd to be using beXXtoh functions instead of htobeXX.  Foruntately
ROT13 works symmetrically on our systems..!
2018-02-05 16:16:17 +00:00
Patrick J Cherry
d1dc7392c2 Open file with O_NOATIME, not O_SYNC
O_SYNC is not necessary as we're not doing direct writes to the file.
O_NOATIME might give some speed boost.
2018-02-05 16:15:36 +00:00
Patrick J Cherry
ba59a4c03f Updated changelog 2018-02-05 08:15:56 +00:00
Patrick J Cherry
2b58468800 Added test for FUA acceptance.
Although I think this might be a bit useless as servers normally just
ingore flags.
2018-02-03 20:29:15 +00:00
Patrick J Cherry
4d9db4d6e9 Added basic FLUSH test 2018-02-03 20:10:47 +00:00
Patrick J Cherry
d6057a4244 Use 'English' in ruby 2018-02-02 21:41:07 +00:00
Patrick J Cherry
1d98ba1d3e Further rubocopping 2018-02-02 21:36:30 +00:00
Patrick J Cherry
9c48da82cc Rubocop 2018-02-02 21:34:14 +00:00
Patrick J Cherry
1b7b688f7a Tidied up nbd init test 2018-02-02 21:30:55 +00:00
Patrick J Cherry
3410ccd4c5 Fixed up commenting around our advertised flags. 2018-02-02 20:50:48 +00:00
Patrick J Cherry
051576df6d Remove warnings about Object#timeout 2018-02-02 20:46:46 +00:00
Patrick J Cherry
9eb7072f49 Removed some extra spaces I'd added 2018-02-02 20:46:25 +00:00
Patrick J Cherry
6aa5907f5e Tidied constants up a bit 2018-02-02 20:34:49 +00:00
Patrick J Cherry
72c8c6f757 Altered test to check for type as a 16-bit uint; added flags test 2018-02-02 20:30:39 +00:00
Patrick J Cherry
b22b99d9b9 Fix fill_request to set flags as well as type. 2018-02-02 20:28:00 +00:00
Patrick J Cherry
ad001cb83c Tidy comments 2018-02-02 16:17:01 +00:00
Patrick J Cherry
f37e4438c8 Merge branch 'develop' into try-flags 2018-02-02 16:05:57 +00:00
Chris Elsworth
084d429961 Merge branch 'update-changelog-for-mr35' into 'develop'
Updated changelog for !35

See merge request open-source/flexnbd-c!39
2018-02-02 14:57:58 +00:00
Patrick J Cherry
1883bee43c Updated changelog for !35 2018-02-02 14:52:26 +00:00
Patrick J Cherry
68a196e93d Allow the proxy connection to pass through flags from upstream. 2018-02-02 10:30:40 +00:00
Patrick J Cherry
1f0ef0aad6 Implement FLUSH command and honour FUA flag
I changed the request struct to break the 32 bits reserved for the
request type into two.  The first part of this is used for the flags
(such as FUA), and the second part for the command type.  Previously
we'd masked the top two bytes, thus ignoring any flags.
2018-02-01 22:13:59 +00:00
Patrick J Cherry
25cc084108 First steps towards implementing flags as part of oldstyle negotiation 2018-02-01 19:25:36 +00:00
Patrick J Cherry
f2fa00260b Merge branch 'avoid-crash-on-timeout' into 'develop'
avoid fatal error on client connection timeout

See merge request open-source/flexnbd-c!36
2018-01-26 16:04:51 +00:00
James F. Carter
b2007c9dad debian: uodate changelog 2018-01-26 15:06:26 +00:00
James F. Carter
9b1781164a avoid fatal error on client connection timeout 2018-01-26 15:03:44 +00:00
Ian Chilton
1f99929589 Merge branch 'develop' into 'develop'
Develop

See merge request open-source/flexnbd-c!35
2018-01-24 12:42:49 +00:00
Chris Cottam
c37627a5b9 not high enough, trying 32MB 2018-01-18 17:08:32 +00:00
Chris Cottam
ceb3328261 increasing the NBD max size to see if it fixes an issue with qemu-2.11.0 2018-01-18 16:52:24 +00:00
Patrick J Cherry
61940bdfc5 Merge branch '34-logging-should-include-the-id-of-the-disc-that-is-being-served' into 'develop'
add a log_context, a string output as part of any log message

Closes #34

See merge request open-source/flexnbd-c!34
2018-01-11 10:35:45 +00:00
James F. Carter
6d96d751d8 debian: update changelog 2018-01-11 10:06:03 +00:00
James F. Carter
fa75de0a8b proxy sets the upstream address and port as its log context 2018-01-11 10:04:18 +00:00
James F. Carter
1cb11bfd38 serve sets the disc's backing file as its log context 2018-01-11 10:03:16 +00:00
James F. Carter
2702e73a26 add a log_context, a string output as part of any log message 2018-01-11 10:01:42 +00:00
Patrick J Cherry
dbf50046a8 Merge branch '33-tcp-keepalive-should-be-applied-to-connection-so-that-dead-connections-can-be-properly-reaped' into 'develop'
apply tcp keepalive to serving sockets

Closes #33

See merge request open-source/flexnbd-c!33
2018-01-10 17:51:02 +00:00
James F. Carter
d62b069ce4 debian: update changelog 2018-01-10 13:58:11 +00:00
James F. Carter
884a714744 whitespace fix 2018-01-10 13:55:05 +00:00
James F. Carter
0c668f1776 remember how || works in C 2018-01-10 13:54:26 +00:00
James F. Carter
1d5b315f17 apply tcp keepalive to serving sockets 2018-01-10 13:49:22 +00:00
Patrick J Cherry
24f1e62a73 Merge branch 'release' into 'develop'
Merge changelog back to develop

See merge request !32
2017-07-14 17:41:51 +01:00
Chris Elsworth
5c37cba39b New release 2017-07-14 17:03:56 +01:00
James F. Carter
59f264184b Merge pull request #1 from BytemarkHosting/better-stats
Calculate and return bytes_left in migration statistics
2017-07-14 16:36:50 +01:00
Chris Elsworth
42d206cfb7 Update test 2017-07-14 16:26:25 +01:00
Chris Elsworth
ab3106202a Also return migration_bytes_left 2017-07-14 16:18:34 +01:00
James Carter
e04dead5ce Merge branch 'update-changelog' into 'develop'
Updated changelog.

See merge request !30
2017-04-13 12:52:00 +01:00
Patrick J Cherry
88bc5f0643 Updated changelog. 2017-04-13 12:49:55 +01:00
James Carter
e89c87e2b9 Merge branch 'fix-compiler-flags' into 'develop'
Remove lots of per-cpu compiler flags.

See merge request !28
2017-02-23 12:11:25 +00:00
Patrick J Cherry
9d2ac3f403 Remove lots of per-cpu compiler flags.
These flags appear to cause SIGILL when flexnbd starts on some CPUs.
2017-02-22 17:52:52 +00:00
James Carter
67823bf85b Merge branch '32-package-and-publish-in-gitlab-ci-retire-maker2-job' into 'master'
Resolve "package and publish in gitlab-ci - retire maker2 job"

Closes #32 and #21

See merge request !27
2017-01-23 14:04:43 +00:00
Patrick J Cherry
17d30b86ad Updated build-deps to have libsubunit and ruby-test-unit 2017-01-23 14:00:09 +00:00
Patrick J Cherry
b97bcd6f51 Don't test separately from packaging. Also use correct source "format" 2017-01-23 13:58:04 +00:00
Patrick J Cherry
4d3c15a4d0 Switch to native from quilted packaging 2017-01-23 13:52:22 +00:00
Patrick J Cherry
83d6872a8d Add ruby test dependency 2017-01-23 13:48:19 +00:00
Patrick J Cherry
ab8470aef3 Modernise gitlab-ci 2017-01-23 13:46:42 +00:00
Patrick J Cherry
716df32fd6 Merge remote-tracking branch 'origin/debian' into 32-package-and-publish-in-gitlab-ci-retire-maker2-job 2017-01-23 13:44:44 +00:00
Michel Pollet
1a768d5e9c Merge branch '29-fix-linker-issue' into 'master'
Link against subunit for testing.

This fixes the problems in Debian stretch+.

Closes #29

See merge request !26
2016-10-13 16:47:37 +01:00
Patrick J Cherry
72992c76ac Added libsubunit to the gitlab-ci 2016-10-13 16:42:21 +01:00
Patrick J Cherry
cace8123f4 Link against subunit for testing.
This fixes the problems in Debian stretch+.
2016-10-13 16:39:20 +01:00
Patrick J Cherry
c3b241464a Updated changelog 2016-10-07 12:26:52 +01:00
Patrick J Cherry
4f956e4b9d Merge branch 'master' of gitlab.bytemark.co.uk:open-source/flexnbd-c into debian 2016-10-07 12:24:51 +01:00
James Carter
b4cb2d9240 Merge branch 'fix-wrong-handle-type' into 'master'
Fix up "wrong" handle type from char* to uint64_t

Following from the NBD handle comparison simplifications.

See merge request !25
2016-10-07 10:20:35 +01:00
James Carter
1efb7bada6 Merge branch 'fix-unsigned-longs-in-bitset-test' into 'master'
fix check_bitset test on 32-bit platforms

The use of `unsigned long` and `UL` suffices caused this test to fail
on 32 bit platforms, where these are just 4, not 8 bits long.

```
tests/unit/check_bitset.c:73:F:bit:test_bit_ranges:0: longs[32] = 0 SHOULD BE ffffffff
```

See merge request !24
2016-10-07 10:20:08 +01:00
James Carter
6bc2a4c0b9 Merge branch 'fix-cast-from-pointer-to-wrong-size-integer-in-serve' into 'master'
This fixes the compiler warning pointer-to-int-cast in serve.c

```
In file included from src/server/bitset.h:4:0,
                 from src/server/mirror.h:8,
                 from src/server/flexnbd.h:5,
                 from src/server/serve.h:8,
                 from src/server/serve.c:1:
src/server/serve.c: In function 'tryjoin_client_thread':
src/server/serve.c:258:6: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      (uint64_t)status);
      ^
```

See merge request !23
2016-10-07 09:59:24 +01:00
James Carter
59de76c50c Merge branch 'skip-large-file-test-on-i386' into 'master'
Skip large file test on 32-bit platforms

This test cannot run on 32-bit machines as they cannot access files
large than 2G.  Makes flexnbd on 32-bit a bit useless really..

See merge request !22
2016-10-07 09:57:09 +01:00
Patrick J Cherry
209da655b3 Skip large file test on 32-bit platforms
This test cannot run on 32-bit machines as they cannot access files
large than 2G.  Makes flexnbd on 32-bit a bit useless really..
2016-10-06 21:42:52 +01:00
Patrick J Cherry
52b45e6b40 fix check_bitset test on 32-bit platforms
The use of `unsigned long` and `UL` suffices caused this test to fail
on 32 bit platforms, where these are just 4, not 8 bits long.

```
tests/unit/check_bitset.c:73:F:bit:test_bit_ranges:0: longs[32] = 0 SHOULD BE ffffffff
```
2016-10-06 21:22:53 +01:00
Patrick J Cherry
d279eb7570 Fix up "wrong" handle type from char* to uint64_t
Following from the NBD handle comparison simplifications.
2016-10-06 21:19:15 +01:00
Patrick J Cherry
c07df76ede This fixes the compiler warning pointer-to-int-cast in serve.c
```
In file included from src/server/bitset.h:4:0,
                 from src/server/mirror.h:8,
                 from src/server/flexnbd.h:5,
                 from src/server/serve.h:8,
                 from src/server/serve.c:1:
src/server/serve.c: In function 'tryjoin_client_thread':
src/server/serve.c:258:6: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      (uint64_t)status);
      ^
```
2016-10-06 21:16:07 +01:00
Patrick J Cherry
e7e99b099c Updated debian packaging, adding in new build-deps. 2016-10-06 16:02:15 +01:00
Patrick J Cherry
b2edd0734a Merge branch 'master' of gitlab.bytemark.co.uk:open-source/flexnbd-c into debian 2016-10-06 16:00:14 +01:00
James Carter
e19d005636 Merge branch '26-fix-function-definition' into 'master'
OK removed the cast and fixed the function def in the test

This should definitely clear the warning.

Closes #26

See merge request !21
2016-10-06 15:59:57 +01:00
Patrick J Cherry
d1e6e835c4 OK removed the cast and fixed the function def in the test
This should definitely clear the warning.
2016-10-06 15:56:57 +01:00
Patrick J Cherry
8fed794fe7 Merge branch 'master' of gitlab.bytemark.co.uk:open-source/flexnbd-c into debian 2016-10-06 15:47:25 +01:00
James Carter
e24efa9864 Merge branch '26-fix-compiler-warning' into 'master'
Resolve "tests/unit/check_readwrite.c causes compiler warnings"

Closes #26

See merge request !20
2016-10-06 15:46:41 +01:00
James Carter
3134d619ef Merge branch '27-fix-make-test' into 'master'
Update Makefile to specify dependencies properly for tests

Closes #27

See merge request !19
2016-10-06 15:44:46 +01:00
Patrick J Cherry
898f3f6c7e Reinstated char * cast to remove compiler warning 2016-10-06 15:43:20 +01:00
Patrick J Cherry
5a1bc21088 Update Makefile to specify dependencies properly for tests 2016-10-06 15:40:15 +01:00
James Carter
deb8f2c53b Merge branch 'fix-check-nbdtypes' into 'master'
Fix up nbdtypes test to correctly use htobe64

Previous change hadn't taken this into account, and hopefully this makes
our test a little clearer.

See merge request !18
2016-10-06 14:50:03 +01:00
Patrick J Cherry
1338d9e910 Fix up nbdtypes test to correctly use htobe64
Previous change hadn't taken this into account, and hopefully this makes
our test a little clearer.
2016-10-06 14:46:29 +01:00
James Carter
47c05174b6 Merge branch 'fix-check-readwrite' into 'master'
Fix check readwrite segfault

Little slip corrected :)

See merge request !17
2016-10-06 14:10:22 +01:00
Patrick J Cherry
191b3bc72c Merge branch 'master' of gitlab.bytemark.co.uk:open-source/flexnbd-c into fix-check-readwrite 2016-10-06 14:06:21 +01:00
James Carter
770ca0d0e5 Merge branch 'fix-test-names' into 'master'
Fixed up internal test names (copy/pasta?)

The test names output by `make check` now reflect reality.

See merge request !16
2016-10-06 14:04:54 +01:00
Patrick J Cherry
6505588f25 Fixed check_readwrite test to pass correct handle to fd_write_reply
The (char*) cast to resp->received.handle.b was causing a segfault
2016-10-06 14:01:47 +01:00
Patrick J Cherry
957707bcfc Fixed up internal test names (copy/pasta?)
The test names output by `make check` now reflect reality.
2016-10-06 13:44:20 +01:00
James Carter
3f01b77221 Merge branch 'update-manpages-again' into 'master'
Updated manpages, replaces a2x with txt2man

This simplifies the build-deps for Debian packages a little, and brings
the docs up to date.

See merge request !15
2016-10-06 13:43:53 +01:00
Patrick J Cherry
0dbea7f8fe Removed extra tabs
Oops
2016-10-06 13:11:07 +01:00
Patrick J Cherry
091aacd16d Updated manpages, replaces a2x with txt2man
This simplifies the build-deps for Debian packages a little, and brings
the docs up to date.
2016-10-06 12:55:05 +01:00
Patrick J Cherry
04b6637451 Merge branch 'failed-tests-cause-error' into 'master'
failures in make check now result in an error



See merge request !13
2016-10-05 17:20:43 +01:00
James F. Carter
7d2eda6cea failures in make check now result in an error 2016-10-05 16:28:27 +01:00
Patrick J Cherry
7e152ca4f2 Merge branch '24-tests-in-gitlab' into 'master'
Resolve "tests should be run in gitlab-ci"

Closes #24

See merge request !12
2016-10-05 15:08:22 +01:00
James F. Carter
fe0125efbc Merge branch 'master' into 24-tests-in-gitlab 2016-10-05 14:27:56 +01:00
James Carter
ebaaa6d671 Merge branch '25-retire-rake' into 'master'
Moved tasks from Rake to Make

The rake file was obsolete, apart from one invocation of ruby in a shell!

Closes #25

See merge request !11
2016-10-05 14:26:06 +01:00
James F. Carter
8cc8588744 Merge branch '24-tests-in-gitlab' of gitlab.bytemark.co.uk:open-source/flexnbd-c into 24-tests-in-gitlab 2016-10-05 13:06:36 +01:00
James F. Carter
5da77ea39a remove unnecessary step in gitlab-ci 2016-10-05 12:52:32 +01:00
James F. Carter
a744965c67 add missing deps on server object files when building check binaries 2016-10-05 12:51:58 +01:00
Patrick J Cherry
d07659f694 Merge branch '24-tests-in-gitlab' of gitlab.bytemark.co.uk:open-source/flexnbd-c into 25-retire-rake 2016-10-05 12:49:53 +01:00
Patrick J Cherry
30562ed900 Added dpkg-dev to requirements
Allows dpkg-architecture to run.
2016-10-05 12:49:25 +01:00
Patrick J Cherry
93c0fa2e92 Merged in gitlab-ci.yml and fixed to use Make
The CI should now use Make instead of Rake
2016-10-05 12:47:24 +01:00
Patrick J Cherry
8dc491fb89 Merge branch '24-tests-in-gitlab' of gitlab.bytemark.co.uk:open-source/flexnbd-c into 25-retire-rake 2016-10-05 12:46:47 +01:00
Patrick J Cherry
ea7cd64fc2 Moved tasks from Rake to Make
The rake file was obsolete, apart from one invocation of ruby in a
shell!
2016-10-05 12:36:06 +01:00
James F. Carter
35d3340708 avoid need for slow-to-install asciidoc in gitlab-ci 2016-10-05 12:10:21 +01:00
James F. Carter
d47a44a204 install asciidoc in gitlab-ci 2016-10-05 12:07:24 +01:00
James F. Carter
d6968d8242 explicitly compile before running tests in gitlab-ci 2016-10-05 12:06:11 +01:00
James F. Carter
bf85e329a0 clean build environment before running tests in gitlab-ci 2016-10-05 12:03:23 +01:00
James F. Carter
edcaef532c revert gitlab-ci to ruby2.1 2016-10-05 11:58:50 +01:00
James F. Carter
cb920e4e9d Merge branch 'master' into 24-tests-in-gitlab 2016-10-05 11:58:21 +01:00
Patrick J Cherry
91d85633b6 Merge branch 'force-encoding-for-ruby21' into 'master'
force binary encoding in a ruby2.1-compatible way



See merge request !10
2016-10-05 11:57:14 +01:00
James Carter
7c516b85a6 Merge branch 'makefile-fixes' into 'master'
Makefile fixes



See merge request !2
2016-10-05 11:55:54 +01:00
James F. Carter
679fa6dbf8 force binary encoding in a ruby2.1-compatible way 2016-10-05 11:54:09 +01:00
James F. Carter
50708326ec try ruby2.3 in gitlab-ci 2016-10-05 11:42:32 +01:00
James F. Carter
d907025d71 try a newer version of ruby in gitlab-ci 2016-10-05 11:39:56 +01:00
James F. Carter
e4d398a078 install net-tools in gitlab-ci 2016-10-05 11:29:42 +01:00
James F. Carter
8de0780125 install libev-dev in gitlab-ci 2016-10-05 11:26:46 +01:00
James F. Carter
0fd16822ea run tests in gitlab-ci 2016-10-05 11:12:39 +01:00
Patrick J Cherry
1e3c61b541 Merge branch '23-fix-unit-tests' into 'master'
update tests to reflect changes in handle storage

Closes #23

See merge request !9
2016-10-05 11:08:21 +01:00
James F. Carter
a09e14b2d4 whitespace fix 2016-10-05 11:06:39 +01:00
James F. Carter
a6710b6c32 update tests to reflect changes in handle storage 2016-10-05 10:57:52 +01:00
Patrick J Cherry
ed3995303f Reinstate doc to all 2016-10-05 10:41:23 +01:00
James Carter
f5de8fb12b Merge branch '20-fix-encoding-failures' into 'master'
Use a BINARY encoded string when doing read/write comparisons.

This is a bit of a cheat really, but `#read` returns an ASCII encoded
string, where as our ruby generates UTF-8 encoded strings, causing
assertion failures.

Closes #20

See merge request !8
2016-10-05 10:32:05 +01:00
James F. Carter
99a5f79a52 fixed typo 2016-10-05 10:30:44 +01:00
Patrick J Cherry
356e1fd6a1 Use a BINARY encoded string when doing read/write comparisons.
This is a bit of a cheat really, but `#read` returns an ASCII encoded
string, where as our ruby generates UTF-8 encoded strings, causing
assertion failures.

Fixes #20
2016-10-05 10:01:15 +01:00
James Carter
67dcea207d Merge branch '19-fix-double-definition-warnings' into 'master'
Fixes "double-definition of constants" warning

Looks like `#constants.include?` doesn't work as well as `#const_defined?`.

Closes #19

See merge request !6
2016-10-05 10:00:50 +01:00
Patrick J Cherry
d3762162db Fixes "double-definition of constants" warning
Looks like `#constants.include?` doesn't work as well as
`#const_defined?`.
2016-10-05 09:29:07 +01:00
Patrick J Cherry
3571d3f82e Added net-tools to the build-deps for testing
Fixes #21
2016-10-05 09:27:10 +01:00
Patrick J Cherry
4cd7e764bb Updated changelog 2016-10-04 21:22:07 +01:00
Patrick J Cherry
4f535fbb02 Merge branch 'master' of gitlab.bytemark.co.uk:open-source/flexnbd-c into debian 2016-10-04 21:14:26 +01:00
James Carter
218c55fb63 Merge branch 'simplify-nbd-handles-part-deux' into 'master'
Simplified NBD handle comparisons

8 bytes, therefore a uing64_t to compare to, no need for memcmp()

Signed-off-by: Michel Pollet <buserror@gmail.com>

See merge request !5
2016-10-04 15:49:07 +01:00
Michel Pollet
956a602475 Simplified NBD handle comparisons
8 bytes, therefore a uing64_t to compare to, no need for memcmp()

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-10-04 15:41:48 +01:00
James Carter
26a0a82f9d Merge branch '12-fix-bind' into 'master'
Attempt at fixing bind() bug

This will prevent the bind() wrapper to loop forever in some cases. I
could nor reproduce the issue, but this removes the only infinite loop I
could find.

Closes #12

See merge request !3
2016-10-04 15:41:37 +01:00
Michel Pollet
76e0476113 Attempt at fixing bind() bug
This will prevent the bind() wrapper to loop forever in some cases. I
could nor reproduc the issue, but this removes the only infinite loop I
could find.

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-10-04 15:36:46 +01:00
Michel Pollet
d9651a038c Makefile: don't include *.d's before 'all'
Include any .d file from the build directory, and do that after all the
other targets

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-10-04 15:32:56 +01:00
Michel Pollet
fcd3d33498 Simplified Makefile
gcc and clang can generate dep files as well as compiling in a single
pass, no need for two.

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-10-04 15:32:49 +01:00
James Carter
e3360a3a1b Merge branch 'cherry-pick-41f25408' into 'master'
Close socket fix, might relate to migration crashing

This was listed as a bug, and was immediatelly picked the static
analyzer anyway, this is very likely the cause for the
migration-cancel-crash bug.

closes #10 and possibly closes #11

See merge request !1
2016-09-14 11:29:12 +01:00
Michel Pollet
1fefe1a669 Close socket fix, might relate to migration crashing
This was listed as a bug, and was immediatelly picked the static
analyzer anyway, this is very likely the cause for the
migration-cancel-crash bug.

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-09-14 10:45:49 +01:00
Patrick J Cherry
4ed8d49b2c Updated rules to skip ruby tests, and just use the normal make check 2016-08-31 10:06:07 +01:00
Patrick J Cherry
3af0e84f5f Updated Debian packaging to be in a separate branch.
This should allow us to use git-buildpackage to build our packages.
2016-08-30 21:57:00 +01:00
Patrick J Cherry
ba14943b60 Removed old changelog.template 2016-08-30 21:49:54 +01:00
Patrick J Cherry
4a709e73f8 Moved .hgignore to .gitignore 2016-08-30 21:47:25 +01:00
Patrick J Cherry
91a8946ddc Removed debian directory 2016-08-30 21:46:59 +01:00
nick
20f99b4554 flexnbd: We only require 1/8th of the memory we allocate for bitsets (bits vs. bytes confusion) 2015-05-13 09:25:09 +01:00
nick
c363991cfd Makefile: Add -lm to LLDFLAGS 2015-04-01 12:39:07 +01:00
Alex Young
c41eeff2fc Moved the server-specific files into src/server 2014-03-11 11:05:43 +00:00
Alex Young
5960e4d10b Remove the proxy's dependency on flexnbd.h 2014-03-11 10:37:00 +00:00
Alex Young
f0911b5c6c Tighten up some variable scopes. 2014-03-11 10:24:29 +00:00
Alex Young
b063f41ba8 Avoid a potential null pointer dereference 2014-03-11 09:57:19 +00:00
Alex Young
28c7e43e45 Fix a harmless buffer overflow 2014-03-11 09:49:25 +00:00
Alex Young
9326b6b882 Merge 2014-02-27 16:18:17 +00:00
Alex Young
f93476ebd3 Replace off64_t with uint64_t where it makes sense to do so.
It looks like off64_t was propagated through the code from the return
type of lseek64(), which isn't appropriate in many of the places we're
using it.
2014-02-27 16:04:25 +00:00
Alex Young
666b60ae1c Allow subset reads in prefetch_contains and prefetch_offset 2014-02-27 14:54:18 +00:00
nick
f48bf2b296 Automated merge with ssh://dev/flexnbd-c 2014-02-27 14:33:01 +00:00
nick
705164ae3b Cork/uncork in mirror - socket_connect already sets nodelay 2014-02-27 14:32:54 +00:00
nick
dbe7053bf3 Avoid some false positives 2014-02-27 14:32:26 +00:00
Alex Young
fa8023cf69 Proxy prefetch cache becomes a command-line argument. 2014-02-27 14:21:36 +00:00
nick
aba802d415 bitset: Allocate the right amount of memory
We were calculating the wrong number of words per byte in the first
place, and then passing the number of *words* to malloc, which expects
the number of *bytes*.

Fix both errors
2014-02-27 12:57:09 +00:00
Alex Young
d146102c2c Cherry-pick extra toolchain Makefile options 2014-02-26 15:56:41 +00:00
Alex Young
5551373073 Merge 2014-02-26 15:37:44 +00:00
Alex Young
77f333423b Apply Michel's tidy-ups 2014-02-26 15:19:03 +00:00
Alex Young
ffa45879d7 Pull back the changelog generation to the simplest thing that can possibly work 2014-02-25 17:24:25 +00:00
Alex Young
2fa1ce8e6b Tweak changelog generation not to skip commits since last tag 2014-02-25 16:35:51 +00:00
nick
6f540ce238 proxy: Turn on TCP_CORK
Now that we're using NODELAY, we should definitely use cork around
writes to the upstream server. This prevents each partial write()
from being its own packet, which would be terrible if it actually
happened with any regularity (we'd mostly see it when the kernel
is stressed, and write() is progressing a few bytes at a time as
a result)
2014-02-25 16:00:48 +00:00
nick
f9a3447bc9 proxy: Turn on TCP_NODELAY for the proxy->upstream leg
Nagle doesn't actually affect us too badly here, as we don't write
the header and then the data in two separate calls under normal
circumstances, which is the pathological case, but we should have
NODELAY on, regardless
2014-02-25 15:59:05 +00:00
nick
7806ec11ee client: cork/uncork around NBD_REQUEST_READ responses
We don't cork/uncork around NBD_REQUEST_WRITE responses because
they're only 16 bytes, and we're using blocking writes.
2014-02-25 15:45:41 +00:00
nick
1817c13acb sockutil: Add a tcp_cork helper 2014-02-25 15:44:46 +00:00
nick
97c8d7a358 Remove a compile-time optional selection of O_DIRECT (was never used)
The mmap() manpage tells us to avoid using O_DIRECT with mmap() - so
do so.
2014-02-24 13:47:29 +00:00
Alex Young
8cf92af900 Call srand() to make sure request handles are properly randomised 2014-02-24 12:20:50 +00:00
Alex Young
5185be39c9 Merge 2014-02-24 11:25:46 +00:00
Alex Young
374b4c616e Remove unreachable code to make -Wunreachable-code on clang useful. 2014-02-24 11:23:09 +00:00
Alex Young
50ec8fb7cc Depend on either libev4 or libev3, whichever is available 2014-02-24 11:22:26 +00:00
Alex Young
5fc9ad6fd8 Add some build-depends which make doc needs 2014-02-21 21:40:55 +00:00
Alex Young
85c463c4bd Add asciidoc as a Build-Depends 2014-02-21 20:46:44 +00:00
Alex Young
278a3151a8 Update Rakefile to generate debian/changelog.
`rake changelog` and a commit should be run after each `hg tag`.
2014-02-21 19:58:02 +00:00
Alex Young
0ea66b1e04 Added tag 0.1.1 for changeset 303f6859295d 2014-02-21 19:54:25 +00:00
Alex Young
83e3d65be9 Update the Makefile to work with dpkg-buildpackage 2014-02-21 19:39:27 +00:00
Alex Young
4f31bd9340 Switch from a rake-based build to a make-based build.
This commit beefs up the Makefile to do the build, instead of the
Rakefile.

It also removes from the Rakefile the dependency on rake_utils, which
should mean it's ok to build in a schroot.

The files are reorganised to make the Makefile rules more tractable,
although the reorganisation reveals a problem with our current code
organisation.

The problem is that the proxy-specific code transitively depends on the
server code via flexnbd.h, which has a circular dependency on the server
and client structs. This should be broken in a future commit by
separating the flexnbd struct into a shared config struct and
server-specific parts, so that the server code can be moved into
src/server to more accurately show the functional dependencies.
2014-02-21 19:10:55 +00:00
nick
0baf93fd7b proxy: Fix a read corruption issue caused by us failing to reset needles on timeout 2014-02-11 20:43:44 +00:00
nick
175f19b3e7 client: Add a cork TODO pair 2014-02-11 15:22:54 +00:00
nick
8d56316548 client: Start checking for exceptions on the client socket 2014-02-11 14:32:12 +00:00
nick
27f2cc7083 Some debug and whitespace tweaks 2014-02-11 14:31:58 +00:00
nick
8084a41ad2 flexnbd client: Catch a few cases where the killswitch wasn't disarmed 2014-01-28 11:45:27 +00:00
nick
5ca5858929 Increase a timeout on a test to handle slow unlink calls on other filesystems 2014-01-22 12:21:49 +00:00
nick
afcc07a181 Fix stop signal logic broken by the killswitch 2014-01-22 12:16:09 +00:00
nick
dcead04cf6 Fix up the check_util test once more 2014-01-22 12:10:34 +00:00
nick
4f7f5f1745 Fix a few dangling bits in client.h 2014-01-22 12:01:42 +00:00
nick
976e9ba07f Automated merge with ssh://dev.bytemark.co.uk//repos/flexnbd-c 2014-01-22 11:49:26 +00:00
nick
91d9531a60 flexnbd serve: Make the killswitch per-client-thread
This is a bit tricky, but calling shutdown() on a socket in a signal
handler is safe, and (at least in linux) appears to cause any read()
or write() calls blocked on that socket to return, even with SA_RESTART.

I'm not confident enough about the rest of flexnbd's syscall error
handling to turn SA_RESTART off for this signal...
2014-01-22 11:49:21 +00:00
nick
905d66af77 Rework a test 2014-01-22 11:45:35 +00:00
nick
eee7c9644c Another fedora build fix 2014-01-22 11:42:00 +00:00
nick
ce5c51cdcf Fix a test case 2014-01-22 11:40:19 +00:00
nick
c6c53c63ba Fix compilation on fedora 2014-01-22 10:39:29 +00:00
Tristan Heaven
20bd58749e Fix help_text errors for break and status modes 2013-11-07 16:45:04 +00:00
nick
866bf835e6 tests: Fix an uninitialized memory access 2013-10-30 22:46:49 +00:00
nick
53cbe14556 mirror: lengthen the request timeout to 60 seconds
This is complicated slightly by a need to keep the tests fast, so
we introduce an environment variable that can override the constant
2013-10-30 22:45:12 +00:00
nick
cd3281f62d acl: Make some compilers happy 2013-10-30 22:44:15 +00:00
nick
1e5457fed0 mirror: Couple of tiny cleanups 2013-10-30 22:04:41 +00:00
nick
0753369b77 mirror: Turn off the 'begin' timer before continuing 2013-10-30 20:25:50 +00:00
nick
9d9ae40953 Increase loglevel of some allocation map messages 2013-10-30 16:40:32 +00:00
nick
65d4f581b9 mirror: Clean up bps calculation slightly 2013-10-24 15:11:55 +01:00
nick
77c71ccf09 mirror: Ensure the bitset is actually disabled on mirror error 2013-10-23 16:18:00 +01:00
nick
97a923afdf mirror: Don't start migrating until the allocation map is built
There is a fun race that can happen if we begin migrating while the
allocation map is still building. We call bitset_enable_stream()
when the migration begins, which causes the builder to start putting
events into the stream. This is bad all by itself, as it slows the
migration down for no reason, but the stream is a limited-size queue
and there are situations (migration fails and is restarted) where we
can end up with the queue full and nobody able to empty it, freezing
the whole thing.
2013-10-23 15:58:47 +01:00
nick
335261869d mirror: Don't count bytes transferred for the purposes of keeping the stream empty as part of our bwlimit
This prevents a fairly nasty situation occurring where the rate of change on the disc is high enough that
just servicing it generates enough traffic to keep us over the bwlimit threshold indefinitely. That would
cause us to sleep during the only windows we'd ordinarily have to advance the offset.
2013-10-23 15:26:28 +01:00
nick
8cf9cae8c0 mirror: Don't sleep if our stream is filling up 2013-10-23 14:38:27 +01:00
nick
6986c70888 bitset: Swap pthread_cond_broadcast for pthread_cond_signal
Normally we'll only have one thread waiting anyway, but there's no
point activating a race here in the cases where we have > 1 waiting,
so signal is what we want.
2013-09-24 15:28:58 +01:00
nick
4b9ded0e1d bitset: More-efficient implementation of bitset_stream_queued_bytes
Rather than iterating the entire queue every time this function is
called, we instead take a small hit on enqueue and dequeue to keep
a running byte total keyed by event type that we can return.
2013-09-24 15:27:17 +01:00
nick
b177faacd6 mirror: Reduce the mirror convergence window to 5 seonds, from 60
Also remove some obsolete constants
2013-09-24 14:42:21 +01:00
nick
96e60a4a29 Added tag 0.1.0 for changeset acad9e9df53c 2013-09-24 12:27:29 +01:00
148 changed files with 14956 additions and 11767 deletions

9
.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
**/*.o
**/*~
flexnbd
build/
pkg/
**/*.orig
**/.*.swp
cscope.out
valgrind.out

27
.gitlab-ci.yml Normal file
View File

@@ -0,0 +1,27 @@
stages:
- package
- publish
package:jessie: &package
stage: package
image: $CI_REGISTRY/docker-images/layers:$DISTRO-deb
variables:
DISTRO: jessie
script:
- package
artifacts:
paths:
- pkg/
package:stretch:
<<: *package
variables:
DISTRO: stretch
publish:
stage: publish
tags:
- shell
script:
- publish

View File

@@ -1,9 +0,0 @@
.o$
~$
^flexnbd$
^build/
^pkg/
\.orig$
.*\.swp$
cscope.out$
valgrind.out$

24
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,24 @@
# Contribution guide
The code is formatted using the K&R style of "indent".
```
indent -kr <files go here>
```
The C unit tests have also been indented in the same way, but manually adjsted
such that the functions follow the normal libcheck layout.
```c
START_TEST( ... ) {
}
END TEST
```
Indent tends to mangle the `END_TEST` macro, so that will need adjusting if
`indent` is run over the test files again.

115
Makefile
View File

@@ -1,10 +1,115 @@
#!/usr/bin/make -f
all:
rake build
VPATH=src:tests/unit
DESTDIR?=/
PREFIX?=/usr/local/bin
INSTALLDIR=$(DESTDIR)/$(PREFIX)
ifdef DEBUG
CFLAGS_EXTRA=-g -DDEBUG
LDFLAGS_EXTRA=-g
else
CFLAGS_EXTRA=-O2
endif
all-debug:
DEBUG=1 rake build
CFLAGS_EXTRA += -fPIC --std=gnu99
LDFLAGS_EXTRA += -Wl,--relax,--gc-sections -L$(LIB) -Wl,-rpath-link,$(LIB)
# The -Wunreachable-code warning is only implemented in clang, but it
# doesn't break anything for gcc to see it.
WARNINGS=-Wall \
-Wextra \
-Werror-implicit-function-declaration \
-Wstrict-prototypes \
-Wno-missing-field-initializers \
-Wunreachable-code
CCFLAGS=-D_GNU_SOURCE=1 $(WARNINGS) $(CFLAGS_EXTRA) $(CFLAGS)
LLDFLAGS=-lm -lrt -lev $(LDFLAGS_EXTRA) $(LDFLAGS)
CC?=gcc
LIBS=-lpthread
INC=-I/usr/include/libev -Isrc/common -Isrc/server -Isrc/proxy
COMPILE=$(CC) -MMD $(INC) -c $(CCFLAGS)
LINK=$(CC) $(LLDFLAGS) -Isrc $(LIBS)
LIB=build/
COMMON_SRC := $(wildcard src/common/*.c)
SERVER_SRC := $(wildcard src/server/*.c)
PROXY_SRC := $(wildcard src/proxy/*.c)
COMMON_OBJ := $(COMMON_SRC:src/%.c=build/%.o)
SERVER_OBJ := $(SERVER_SRC:src/%.c=build/%.o)
PROXY_OBJ := $(PROXY_SRC:src/%.c=build/%.o)
SRCS := $(COMMON_SRC) $(SERVER_SRC) $(PROXY_SRC)
OBJS := $(COMMON_OBJ) $(SERVER_OBJ) $(PROXY_OBJ)
all: build doc
build: server proxy
build/%.o: %.c
mkdir -p $(dir $@)
$(COMPILE) $< -o $@
objs: $(OBJS)
build/flexnbd: $(COMMON_OBJ) $(SERVER_OBJ) build/main.o
$(LINK) $^ -o $@
build/flexnbd-proxy: $(COMMON_OBJ) $(PROXY_OBJ) build/proxy-main.o
$(LINK) $^ -o $@
server: build/flexnbd
proxy: build/flexnbd-proxy
CHECK_SRC := $(wildcard tests/unit/*.c)
CHECK_OBJ := $(CHECK_SRC:tests/unit/%.c=build/%.o)
# Why can't we reuse the build/%.o rule above? Not sure.
CHECK_BINS := $(CHECK_SRC:tests/unit/%.c=build/%)
build/check_%: build/check_%.o
$(LINK) $^ -o $@ $(COMMON_OBJ) $(SERVER_OBJ) -lcheck -lsubunit
check_objs: $(CHECK_OBJ)
check_bins: $(CHECK_BINS)
check: $(OBJS) $(CHECK_BINS)
r=true ; for bin in $(CHECK_BINS); do $$bin || r=false; done ; $$r
acceptance: build
cd tests/acceptance && RUBYOPT='-I.' ruby nbd_scenarios -v
test: check acceptance
build/flexnbd.1: README.txt
txt2man -t flexnbd -s 1 $< > $@
build/flexnbd-proxy.1: README.proxy.txt
txt2man -t flexnbd-proxy -s 1 $< > $@
# If we don't pipe to file, gzip clobbers the original, causing make
# to rebuild each time
%.1.gz: %.1
gzip -c -f $< > $@
doc: build/flexnbd.1.gz build/flexnbd-proxy.1.gz
install:
mkdir -p $(INSTALLDIR)
cp build/flexnbd build/flexnbd-proxy $(INSTALLDIR)
clean:
rake clean
rm -rf build/*
.PHONY: clean objs check_objs all server proxy check_bins check doc build test acceptance
# Include extra dependencies at the end, NOT before 'all'
-include $(wildcard build/*.d)

View File

@@ -1,19 +1,14 @@
FLEXNBD-PROXY(1)
================
:doctype: manpage
NAME
----
flexnbd-proxy - A simple NBD proxy
flexnbd-proxy - A simple NBD proxy
SYNOPSIS
--------
*flexnbd-proxy* ['OPTIONS']
flexnbd-proxy --addr ADDR [--port PORT] --conn-addr ADDR
--conn-port PORT [--bind ADDR] [--cache[=CACHE_BYTES]]
[--help] [--verbose] [--quiet]
DESCRIPTION
-----------
flexnbd-proxy is a simple NBD proxy server that implements resilient
connection logic for the client. It connects to an upstream NBD server
@@ -25,10 +20,6 @@ of view of the client) reconnects and retransmits the request, before
returning the response to the client.
USAGE
-----
$ flexnbd-proxy --addr <ADDR> [ --port <PORT> ]
--conn-addr <ADDR> --conn-port <PORT> [--bind <ADDR>] [option]*
Proxy requests from an NBD client to an NBD server, resiliently. Only one
client can be connected at a time, and ACLs cannot be applied to the client, as they
@@ -57,71 +48,73 @@ Only one request may be in-flight at a time under the current architecture; that
doesn't seem to slow things down much relative to alternative options, but may
be changed in the future if it becomes an issue.
Options
~~~~~~~
OPTIONS
*--addr, -l ADDR*:
--addr, -l ADDR
The address to listen on. If this begins with a '/', it is assumed to be
a UNIX domain socket to create. Otherwise, it should be an IPv4 or IPv6
address.
*--port, -p PORT*:
--port, -p PORT
The port to listen on, if --addr is not a UNIX socket.
*--conn-addr, -C ADDR*:
--conn-addr, -C ADDR
The address of the NBD server to connect to. Required.
*--conn-port, -P PORT*:
--conn-port, -P PORT
The port of the NBD server to connect to. Required.
*--help, -h* :
--cache, -c=CACHE_BYTES
If given, the size in bytes of read cache to use. CACHE_BYTES
defaults to 4096.
--help, -h
Show command or global help.
*--verbose, -v* :
--verbose, -v
Output all available log information to STDERR.
*--quiet, -q* :
--quiet, -q
Output as little log information as possible to STDERR.
LOGGING
-------
Log output is sent to STDERR. If --quiet is set, no output will be seen
unless the program termintes abnormally. If neither --quiet nor
Log output is sent to STDERR. If --quiet is set, no output will be
seen unless the program termintes abnormally. If neither --quiet nor
--verbose are set, no output will be seen unless something goes wrong
with a specific request. If --verbose is given, every available log
message will be seen (which, for a debug build, is many). It is not an
error to set both --verbose and --quiet. The last one wins.
with a specific request. If --verbose is given, every available log
message will be seen (which, for a debug build, is many). It is not an
error to set both --verbose and --quiet. The last one wins.
The log line format is:
<TIMESTAMP>:<LEVEL>:<PID> <THREAD> <SOURCEFILE>:<SOURCELINE>: <MSG>
<TIMESTAMP>:<LEVEL>:<PID> <THREAD> <SOURCEFILE>:<SOURCELINE>: <MSG>
*TIMESTAMP*:
<TIMESTAMP>
Time the log entry was made. This is expressed in terms of monotonic ms
*LEVEL*:
<LEVEL>
This will be one of 'D', 'I', 'W', 'E', 'F' in increasing order of
severity. If flexnbd is started with the --quiet flag, only 'F' will be
seen. If it is started with the --verbose flag, any from 'I' upwards
will be seen. Only if you have a debug build and start it with
--verbose will you see 'D' entries.
severity. If flexnbd is started with the --quiet flag, only 'F' will
be seen. If it is started with the --verbose flag, any from 'I'
upwards will be seen. Only if you have a debug build and start it
with --verbose will you see 'D' entries.
*PID*:
<PID>
This is the process ID.
*THREAD*:
flexnbd-proxy is currently single-threaded, so this should be the same
for all lines. That may not be the case in the future.
<THREAD>
flexnbd-proxy is currently single-threaded, so this should be the
same for all lines. That may not be the case in the future.
*SOURCEFILE:SOURCELINE*:
<SOURCEFILE:SOURCELINE>
Identifies where in the source code this log line can be found.
*MSG*:
<MSG>
A short message describing what's happening, how it's being done, or
if you're very lucky *why* it's going on.
if you're very lucky why it's going on.
Proxying
~~~~~~~~
EXAMPLES
The main point of the proxy mode is to allow clients that would otherwise break
when the NBD server goes away (during a migration, for instance) to see a
@@ -154,31 +147,60 @@ The proxy notices and reconnects, fulfiling any request it has in its buffer.
The data in myfile has been moved between physical servers without the nbd
client process having to be disturbed at all.
BUGS
----
READ CACHE
Should be reported to nick@bytemark.co.uk.
If the --cache option is given at the command line, either without an
argument or with an argument greater than 0, flexnbd-proxy will use a
read-ahead cache. The cache as currently implemented doubles each read
request size, up to a maximum of 2xCACHE_BYTES, and retains the latter
half in a buffer. If the next read request from the client exactly
matches the region held in the buffer, flexnbd-proxy responds from the
cache without making a request to the server.
This pattern is designed to match sequential reads, such as those
performed by a booting virtual machine.
Note: If specifying a cache size, you must use this form:
nbd-client$ flexnbd-proxy --cache=XXXX
That is, the '=' is required. This is a limitation of getopt-long.
If no cache size is given, a size of 4096 bytes is assumed. Caching can
be explicitly disabled by setting a size of 0.
BUGS
Should be reported via GitHub.
* https://github.com/BytemarkHosting/flexnbd-c/issues
Current issues include:
* Only old-style NBD negotiation is supported
* Only one request may be in-flight at a time
* All I/O is blocking, and signals terminate the process immediately
* UNIX socket support is limited to the listen address
* FLUSH and TRIM commands, and the FUA flag, are not supported
* DISCONNECT requests do not get passed through to the NBD server
* No active timeout-retry of requests - we trust the kernel's idea of failure
* only old-style NBD negotiation is supported;
* only one request may be in-flight at a time;
* all I/O is blocking, and signals terminate the process immediately;
* UNIX socket support is limited to the listen address;
* FLUSH and TRIM commands, and the FUA flag, are not supported;
* DISCONNECT requests do not get passed through to the NBD server;
* no active timeout-retry of requests - we trust the kernel's idea of
failure.
AUTHOR
------
Written by Alex Young <alex@bytemark.co.uk>.
Originally written by Alex Young <alex@blackkettle.org>.
Original concept and core code by Matthew Bloch <matthew@bytemark.co.uk>.
Proxy mode written by Nick Thomas <nick@bytemark.co.uk>
Proxy mode written by Nick Thomas <me@ur.gs>.
COPYING
-------
The full commit history is available on GitHub.
Copyright (c) 2012 Bytemark Hosting Ltd. Free use of this software is
granted under the terms of the GNU General Public License version 3 or
later.
SEE ALSO
flexnbd(1), nbd-client(8), xnbd-server(8), xnbd-client(8)
COPYRIGHT
Copyright (c) 2012-2016 Bytemark Hosting Ltd. Free use of this
software is granted under the terms of the GNU General Public License
version 3 or later.

View File

@@ -1,17 +1,36 @@
FLEXNBD(1)
==========
:doctype: manpage
NAME
----
flexnbd - A fast NBD server
SYNOPSIS
--------
*flexnbd* 'COMMAND' ['OPTIONS']
flexnbd MODE [ ARGS ]
flexnbd serve --addr ADDR --port PORT --file FILE [--sock SOCK]
[--default-deny] [--killswitch] [global_option]* [acl_entry]*
flexnbd listen --addr ADDR --port PORT --file FILE [--sock SOCK]
[--default-deny] [global_option]* [acl_entry]*
flexnbd mirror --addr ADDR --port PORT --sock SOCK [--unlink]
[--bind BIND_ADDR] [global_option]*
flexnbd acl --sock SOCK [acl_entry]+ [global_option]*
flexnbd break --sock SOCK [global_option]*
flexnbd status --sock SOCK [global_option]*
flexnbd read --addr ADDR --port PORT --from OFFSET --size SIZE
[--bind BIND_ADDR] [global_option]*
flexnbd write --addr ADDR --port PORT --from OFFSET --size SIZE
[--bind BIND_ADDR] [global_option]*
flexnbd help [mode] [global_option]*
DESCRIPTION
-----------
Flexnbd is a fast NBD server which supports live migration. Live
migration is performed by writing the data to a new server. A failed
migration will be invisible to any connected clients.
@@ -19,298 +38,290 @@ migration will be invisible to any connected clients.
Flexnbd tries quite hard to preserve sparsity of files it is serving,
even across migrations.
COMMANDS
--------
SERVE MODE
Serve a file.
serve
~~~~~
$ flexnbd serve --addr <ADDR> --port <PORT> --file <FILE>
[--sock <SOCK>] [--default-deny] [global option]* [acl entry]*
[--sock <SOCK>] [--default-deny] [-k] [global_option]*
[acl_entry]*
Serve a file. If any ACL entries are given (which should be IP
If any ACL entries are given (which should be IP
addresses), only those clients listed will be permitted to connect.
flexnbd will continue to serve until a SIGINT, SIGQUIT, or a successful
migration.
Options
^^^^^^^
OPTIONS
*--addr, -l ADDR*:
--addr, -l ADDR
The address to listen on. Required.
*--port, -p PORT*:
--port, -p PORT
The port to listen on. Required.
*--file, -f FILE*:
--file, -f FILE
The file to serve. Must already exist. Required.
*--sock, -s SOCK*:
Path to a control socket to open. You will need this if you want to
--sock, -s SOCK
Path to a control socket to open. You will need this if you want to
migrate, get the current status, or manipulate the access control
list.
*--default-deny, -d*:
How to interpret an empty ACL. If --default-deny is given, an
empty ACL will let no clients connect. If it is not given, an
--default-deny, -d
How to interpret an empty ACL. If --default-deny is given, an
empty ACL will let no clients connect. If it is not given, an
empty ACL will let any client connect.
listen
~~~~~~
--killswitch, -k
If set, we implement a 2-minute timeout on NBD requests and
responses. If a request takes longer than that to complete,
the client is disconnected. This is useful to keep broken
clients from breaking migrations, among other things.
$ flexnbd listen --addr <ADDR> --port <PORT> --file <FILE>
[--sock <SOCK>] [--default-deny] [global option]* [acl entry]*
LISTEN MODE
Listen for an inbound migration, and quit with a status of 0 on
completion.
$ flexnbd listen --addr ADDR --port PORT --file FILE
[--sock SOCK] [--default-deny] [global_option]*
[acl_entry]*
flexnbd will wait for a successful migration, and then quit. The file
to write the inbound migration data to must already exist before you
run 'flexnbd listen'.
Only one sender may connect to send data, and if the sender
disconnects part-way through the migration, the destination will
expect it to reconnect and retry the whole migration. It isn't safe
expect it to reconnect and retry the whole migration. It isn't safe
to assume that a partial migration can be resumed because the
destination has no knowledge of whether a client has made a write to
the source in the interim.
If the migration fails for a reason which the `flexnbd listen` process
If the migration fails for a reason which the 'flexnbd listen' process
can't fix (say, a failed local write), it will exit with an error
status. In this case, the sender will continually retry the migration
until it succeeds, and you will need to restart the `flexnbd listen`
status. In this case, the sender will continually retry the migration
until it succeeds, and you will need to restart the 'flexnbd listen'
process to allow that to happen.
Options
^^^^^^^
As for 'serve'.
OPTIONS
mirror
~~~~~~
As for serve.
$ flexnbd mirror --addr <ADDR> --port <PORT> --sock SOCK
[--unlink] [--bind <BIND-ADDR>] [global option]*
MIRROR MODE
Start a migration from the server with control socket SOCK to the server
listening at ADDR:PORT.
$ flexnbd mirror --addr ADDR --port PORT --sock SOCK [--unlink]
[--bind BIND_ADDR] [global_option]*
Migration can be a slow process. Rather than block the 'flexnbd mirror'
process until it completes, it will exit with a message of "Migration
started" once it has confirmation that the local server was able to
connect to ADDR:PORT and got an NBD header back. To check on the
connect to ADDR:PORT and got an NBD header back. To check on the
progress of a running migration, use 'flexnbd status'.
If the destination unexpectedly disconnects part-way through the
migration, the source will attempt to reconnect and start the migration
again. It is not safe to resume the migration from where it left off
again. It is not safe to resume the migration from where it left off
because the source can't see that the backing store behind the
destination is intact, or even on the same machine.
If the `--unlink` option is given, the local file will be deleted
immediately before the mirror connection is terminated. This allows
If the --unlink option is given, the local file will be deleted
immediately before the mirror connection is terminated. This allows
an otherwise-ambiguous situation to be resolved: if you don't unlink
the file and the flexnbd process at either end is terminated, it's not
possible to tell which copy of the data is canonical. Since the
possible to tell which copy of the data is canonical. Since the
unlink happens as soon as the sender knows that it has transmitted all
the data, there can be no ambiguity.
Note: files smaller than 4096 bytes cannot be mirrored.
Options
^^^^^^^
OPTIONS
*--addr, -l ADDR*:
The address of the remote server to migrate to. Required.
--addr, -l ADDR
The address of the remote server to migrate to. Required.
*--port, -p PORT*:
The port of the remote server to migrate to. Required.
--port, -p PORT
The port of the remote server to migrate to. Required.
*--sock, -s SOCK*:
The control socket of the local server to migrate from. Required.
--sock, -s SOCK
The control socket of the local server to migrate from. Required.
*--unlink, -u*:
Unlink the served file from the local filesystem after successfully
mirroring.
--unlink, -u
Unlink the served file from the local filesystem after
successfully mirroring.
*--bind, -b BIND-ADDR*:
The local address to bind to. You may need this if the remote server
is using an access control list.
--bind, -b BIND_ADDR
The local address to bind to. You may need this if the remote
server is using an access control list.
break
~~~~~
$ flexnbd mirror --sock SOCK [global option]*
BREAK MODE
Stop a running migration.
Options
^^^^^^^
$ flexnbd break --sock SOCK [global_option]*
*--sock, -s SOCK*:
The control socket of the local server whose emigration to stop.
Required.
OPTIONS
--sock, -s SOCK
The control socket of the local server whose migration to stop.
Required.
acl
~~~
$ flexnbd acl --sock <SOCK> [acl entry]+ [global option]*
ACL MODE
Set the access control list of the server with the control socket SOCK
to the given access control list entries.
$ flexnbd acl --sock SOCK [acl_entry]+ [global_option]*
ACL entries are given as IP addresses.
Options
^^^^^^^
OPTIONS
*--sock, -s SOCK*:
The control socket of the server whose ACL to replace.
--sock, -s SOCK
The control socket of the server whose ACL to replace. Required
status
~~~~~~
$ flexnbd status --sock <SOCK> [global option]*
STATUS MODE
Get the current status of the server with control socket SOCK.
The status will be printed to STDOUT. It is a space-separated list of
key=value pairs. The space character will never appear in a key or
value. Currently reported values are:
$ flexnbd status --sock SOCK [global_option]*
*pid*:
The status will be printed to STDOUT. It is a space-separated list of
key=value pairs. The space character will never appear in a key or
value. Currently reported values are:
pid
The process id of the server listening on SOCK.
*is_mirroring*:
is_mirroring
'true' if this server is sending migration data, 'false' otherwise.
*has_control*:
has_control
'false' if this server was started in 'listen' mode. 'true' otherwise.
read
~~~~
OPTIONS
$ flexnbd read --addr <ADDR> --port <PORT> --from <OFFSET>
--size <SIZE> [--bind BIND-ADDR] [global option]*
--sock, -s SOCK
The control socket of the server of interest. Required.
READ MODE
Connect to the server at ADDR:PORT, and read SIZE bytes starting at
OFFSET in a single NBD query. The returned data will be echoed to
STDOUT. In case of a remote ACL, set the local source address to
BIND-ADDR.
OFFSET in a single NBD query.
Options
^^^^^^^
$ flexnbd read --addr ADDR --port PORT --from OFFSET --size SIZE
[--bind BIND_ADDR] [global_option]*
*--addr, -l ADDR*:
The address of the remote server. Required.
The returned data will be echoed to STDOUT. In case of a remote ACL,
set the local source address to BIND_ADDR.
*--port, -p PORT*:
The port of the remote server. Required.
OPTIONS
*--from, -F OFFSET*:
The byte offset to start reading from. Required. Maximum 2^62.
--addr, -l ADDR
The address of the remote server. Required.
*--size, -S SIZE*:
The number of bytes to read. Required. Maximum 2^30.
--port, -p PORT
The port of the remote server. Required.
*--bind, -b BIND-ADDR*:
The local address to bind to. You may need this if the remote server
is using an access control list.
--from, -F OFFSET
The byte offset to start reading from. Required. Maximum 2^62.
write
~~~~~
--size, -S SIZE
The number of bytes to read. Required. Maximum 2^30.
$ cat ... | flexnbd write --addr <ADDR> --port <PORT> --from <OFFSET>
--size <SIZE> [--bind BIND-ADDR] [global option]*
--bind, -b BIND_ADDR
The local address to bind to. You may need this if the remote
server is using an access control list.
WRITE MODE
Connect to the server at ADDR:PORT, and write SIZE bytes from STDIN
starting at OFFSET in a single NBD query. In case of a remote ACL, set
the local source address to BIND-ADDR.
starting at OFFSET in a single NBD query.
Options
^^^^^^^
$ cat ... | flexnbd write --addr ADDR --port PORT --from OFFSET
--size SIZE [--bind BIND_ADDR] [global_option]*
*--addr, -l ADDR*:
The address of the remote server. Required.
In case of a remote ACL, set the local source address to BIND_ADDR.
*--port, -p PORT*:
The port of the remote server. Required.
OPTIONS
*--from, -F OFFSET*:
The byte offset to start writing from. Required. Maximum 2^62.
--addr, -l ADDR
The address of the remote server. Required.
*--size, -S SIZE*:
The number of bytes to write. Required. Maximum 2^30.
--port, -p PORT
The port of the remote server. Required.
*--bind, -b BIND-ADDR*:
The local address to bind to. You may need this if the remote server
is using an access control list.
--from, -F OFFSET
The byte offset to start writing from. Required. Maximum 2^62.
help
~~~~
--size, -S SIZE
The number of bytes to write. Required. Maximum 2^30.
$ flexnbd help [command] [global option]*
--bind, -b BIND_ADDR
The local address to bind to. You may need this if the remote
server is using an access control list.
Without 'command', show the list of available commands. With 'command',
show help for that command.
HELP MODE
$ flexnbd help [mode] [global_option]*
Without mode, show the list of available modes. With mode, show help for that mode.
GLOBAL OPTIONS
--------------
*--help, -h* :
Show command or global help.
--help, -h Show mode or global help.
*--verbose, -v* :
Output all available log information to STDERR.
*--quiet, -q* :
Output as little log information as possible to STDERR.
--verbose, -v Output all available log information to STDERR.
--quiet, -q Output as little log information as possible to STDERR.
LOGGING
-------
Log output is sent to STDERR. If --quiet is set, no output will be seen
unless the program termintes abnormally. If neither --quiet nor
Log output is sent to STDERR. If --quiet is set, no output will be
seen unless the program termintes abnormally. If neither --quiet nor
--verbose are set, no output will be seen unless something goes wrong
with a specific request. If --verbose is given, every available log
message will be seen (which, for a debug build, is many). It is not an
error to set both --verbose and --quiet. The last one wins.
with a specific request. If --verbose is given, every available log
message will be seen (which, for a debug build, is many). It is not an
error to set both --verbose and --quiet. The last one wins.
The log line format is:
<TIMESTAMP>:<LEVEL>:<PID> <THREAD> <SOURCEFILE>:<SOURCELINE>: <MSG>
<TIMESTAMP>:<LEVEL>:<PID> <THREAD> <SOURCEFILE:SOURCELINE>: <MSG>
*TIMESTAMP*:
Time the log entry was made. This is expressed in terms of monotonic ms.
<TIMESTAMP>
Time the log entry was made. This is expressed in terms of monotonic
ms.
*LEVEL*:
<LEVEL>
This will be one of 'D', 'I', 'W', 'E', 'F' in increasing order of
severity. If flexnbd is started with the --quiet flag, only 'F' will be
seen. If it is started with the --verbose flag, any from 'I' upwards
will be seen. Only if you have a debug build and start it with
--verbose will you see 'D' entries.
severity. If flexnbd is started with the --quiet flag, only 'F'
will be seen. If it is started with the --verbose flag, any from 'I'
upwards will be seen. Only if you have a debug build and start it
with --verbose will you see 'D' entries.
*PID*:
<PID>
This is the process ID.
*THREAD*:
There are several pthreads per flexnbd process: a main thread, a serve
thread, a thread per client, and possibly a pair of mirror threads and a
control thread. This field identifies which thread was responsible for
the log line.
<THREAD>
There are several pthreads per flexnbd process: a main thread, a
serve thread, a thread per client, and possibly a pair of mirror
threads and a control thread. This field identifies which thread was
responsible for the log line.
*SOURCEFILE:SOURCELINE*:
<SOURCEFILE:SOURCELINE>
Identifies where in the source code this log line can be found.
*MSG*:
<MSG>
A short message describing what's happening, how it's being done, or
if you're very lucky *why* it's going on.
if you're very lucky why it's going on.
EXAMPLES
--------
Serving a file
~~~~~~~~~~~~~~
SERVING A FILE
The simplest case is serving a file on the default nbd port:
@@ -320,8 +331,7 @@ The simplest case is serving a file on the default nbd port:
root:x:
$
Reading server status
~~~~~~~~~~~~~~~~~~~~~
READING SERVER STATUS
In order to read a server's status, we need it to open a control socket.
@@ -329,13 +339,12 @@ In order to read a server's status, we need it to open a control socket.
--sock /tmp/flexnbd.sock
$ flexnbd status --sock /tmp/flexnbd.sock
pid=9635 is_mirroring=false has_control=true
$
Note that the status output is newline-terminated.
Migrating
~~~~~~~~~
MIGRATING
To migrate, we need to provide a destination file of the right size.
@@ -361,8 +370,8 @@ With this knowledge in hand, we can start the migration:
$ flexnbd mirror --addr 127.0.0.1 --port 4779 \
--sock /tmp/flex-source.sock
Migration started
[1] + 9648 done build/flexnbd serve --addr 0.0.0.0 --port 4778
[2] + 9651 done build/flexnbd listen --addr 0.0.0.0 --port 4779
[1] + 9648 done flexnbd serve --addr 0.0.0.0 --port 4778
[2] + 9651 done flexnbd listen --addr 0.0.0.0 --port 4779
$
Note that because the file is so small in this case, we see the source
@@ -370,21 +379,25 @@ server quit soon after we start the migration, and the destination
exited at roughly the same time.
BUGS
----
Should be reported to alex@bytemark.co.uk.
Should be reported on GitHub at
* https://github.com/BytemarkHosting/flexnbd-c/issues
AUTHOR
------
Written by Alex Young <alex@bytemark.co.uk>.
Originally written by Alex Young <alex@blackkettle.org>.
Original concept and core code by Matthew Bloch <matthew@bytemark.co.uk>.
Some additions by Nick Thomas <nick@bytemark.co.uk>
Proxy mode written by Nick Thomas <me@ur.gs>.
COPYING
-------
The full commit history is available on GitHub.
Copyright (c) 2012 Bytemark Hosting Ltd. Free use of this software is
granted under the terms of the GNU General Public License version 3 or
later.
SEE ALSO
flexnbd-proxy(1), nbd-client(8), xnbd-server(8), xnbd-client(8)
COPYRIGHT
Copyright (c) 2012-2016 Bytemark Hosting Ltd. Free use of this
software is granted under the terms of the GNU General Public License
version 3 or later.

312
Rakefile
View File

@@ -1,312 +0,0 @@
$: << '../rake_utils/lib'
require 'rake_utils/debian'
include RakeUtils::DSL
CC=ENV['CC'] || "gcc"
DEBUG = ENV.has_key?('DEBUG') &&
%w|yes y ok 1 true t|.include?(ENV['DEBUG'])
ALL_SOURCES = FileList['src/*']
PROXY_ONLY_SOURCES = FileList['src/{proxy-main,proxy}.c']
PROXY_ONLY_OBJECTS = PROXY_ONLY_SOURCES.pathmap( "%{^src,build}X.o" )
SOURCES = ALL_SOURCES.select { |c| c =~ /\.c$/ } - PROXY_ONLY_SOURCES
OBJECTS = SOURCES.pathmap( "%{^src,build}X.o" ) - PROXY_ONLY_OBJECTS
PROXY_SOURCES = FileList['src/{ioutil,nbdtypes,readwrite,sockutil,util,parse}.c'] + PROXY_ONLY_SOURCES
PROXY_OBJECTS = PROXY_SOURCES.pathmap( "%{^src,build}X.o" )
TEST_SOURCES = FileList['tests/unit/*.c']
TEST_OBJECTS = TEST_SOURCES.pathmap( "%{^tests/unit,build/tests}X.o" )
LIBS = %w( pthread )
LDFLAGS = ["-lrt -lev"]
CCFLAGS = %w(
-D_GNU_SOURCE=1
-Wall
-Wextra
-Werror-implicit-function-declaration
-Wstrict-prototypes
-Wno-missing-field-initializers
) + # Added -Wno-missing-field-initializers to shut GCC up over {0} struct initialisers
[ENV['CFLAGS']]
LIBCHECK = File.exists?("/usr/lib/libcheck.a") ?
"/usr/lib/libcheck.a" :
"/usr/local/lib/libcheck.a"
TEST_MODULES = Dir["tests/unit/check_*.c"].map { |n|
File.basename( n )[%r{check_(.+)\.c},1] }
if DEBUG
LDFLAGS << ["-g"]
CCFLAGS << ["-g -DDEBUG"]
else
CCFLAGS << "-O2"
end
desc "Build the binary and man page"
task :build => [:flexnbd, :flexnbd_proxy, :man]
task :default => :build
desc "Build just the flexnbd binary"
task :flexnbd => "build/flexnbd"
desc "Build just the flexnbd-proxy binary"
task :flexnbd_proxy => "build/flexnbd-proxy"
def check(m)
"build/tests/check_#{m}"
end
file "README.txt"
file "README.proxy.txt"
def manpage(name, src)
FileUtils.mkdir_p( "build" )
sh "a2x --destination-dir build --format manpage #{src}"
sh "gzip -f build/#{name}"
end
file "build/flexnbd.1.gz" => "README.txt" do
manpage("flexnbd.1", "README.txt")
end
file "build/flexnbd-proxy.1.gz" => "README.proxy.txt" do
manpage("flexnbd-proxy.1", "README.proxy.txt")
end
desc "Build just the man page"
task :man => ["build/flexnbd.1.gz", "build/flexnbd-proxy.1.gz"]
namespace "test" do
desc "Run all tests"
task 'run' => ["unit", "scenarios"]
desc "Build C tests"
task 'build' => TEST_MODULES.map { |n| check n}
TEST_MODULES.each do |m|
desc "Run tests for #{m}"
task "check_#{m}" => check(m) do
sh check m
end
end
desc "Run C tests"
task 'unit' => 'build' do
TEST_MODULES.each do |n|
ENV['EF_DISABLE_BANNER'] = '1'
sh check n
end
end
desc "Run NBD test scenarios"
task 'scenarios' => ['build/flexnbd', 'build/flexnbd-proxy'] do
sh "cd tests/acceptance; ruby nbd_scenarios -v"
end
end
def gcc_compile( target, source )
FileUtils.mkdir_p File.dirname( target )
sh "#{CC} -Isrc -c #{CCFLAGS.join(' ')} -o #{target} #{source} "
end
def gcc_link(target, objects)
FileUtils.mkdir_p File.dirname( target )
sh "#{CC} #{LDFLAGS.join(' ')} "+
" -Isrc " +
" -o #{target} "+
objects.join(" ") +
" "+LIBS.map { |l| "-l#{l}" }.join(" ")
end
def headers(c)
`#{CC} -Isrc -MM #{c}`.gsub("\\\n", " ").split(" ")[2..-1]
end
rule 'build/flexnbd-proxy' => PROXY_OBJECTS do |t|
gcc_link(t.name, t.sources)
end
rule 'build/flexnbd' => OBJECTS do |t|
gcc_link(t.name, t.sources)
end
file check("client") =>
%w{build/tests/check_client.o
build/self_pipe.o
build/nbdtypes.o
build/flexnbd.o
build/flexthread.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/serve.o
build/acl.o
build/ioutil.o
build/mbox.o
build/mirror.o
build/status.o
build/sockutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("acl") =>
%w{build/tests/check_acl.o
build/parse.o
build/acl.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check( "util" ) =>
%w{build/tests/check_util.o
build/util.o
build/self_pipe.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("serve") =>
%w{build/tests/check_serve.o
build/self_pipe.o
build/nbdtypes.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/flexthread.o
build/serve.o
build/flexnbd.o
build/mirror.o
build/status.o
build/acl.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("status") =>
%w{
build/tests/check_status.o
build/self_pipe.o
build/nbdtypes.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/flexthread.o
build/serve.o
build/flexnbd.o
build/mirror.o
build/status.o
build/acl.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o
} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("readwrite") =>
%w{build/tests/check_readwrite.o
build/readwrite.o
build/client.o
build/self_pipe.o
build/serve.o
build/parse.o
build/acl.o
build/flexthread.o
build/control.o
build/flexnbd.o
build/mirror.o
build/status.o
build/nbdtypes.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("flexnbd") =>
%w{build/tests/check_flexnbd.o
build/flexnbd.o
build/ioutil.o
build/sockutil.o
build/util.o
build/control.o
build/mbox.o
build/flexthread.o
build/status.o
build/self_pipe.o
build/client.o
build/acl.o
build/parse.o
build/nbdtypes.o
build/readwrite.o
build/mirror.o
build/serve.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("control") =>
%w{build/tests/check_control.o} + OBJECTS - ["build/main.o", 'build/proxy-main.o', 'build/proxy.o'] do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
(TEST_MODULES- %w{status control flexnbd acl client serve readwrite util}).each do |m|
tgt = "build/tests/check_#{m}.o"
maybe_obj_name = "build/#{m}.o"
# Take it out in case we're testing one of the utils
deps = ["build/ioutil.o", "build/util.o", "build/sockutil.o"] - [maybe_obj_name]
# Add it back in if it's something we need to compile
deps << maybe_obj_name if OBJECTS.include?( maybe_obj_name )
file check( m ) => deps + [tgt] do |t|
gcc_link(t.name, deps + [tgt, LIBCHECK])
end
end
OBJECTS.zip( SOURCES ).each do |o,c|
file o => [c]+headers(c) do |t| gcc_compile( o, c ) end
end
PROXY_ONLY_OBJECTS.zip( PROXY_ONLY_SOURCES).each do |o, c|
file o => [c]+headers(c) do |t| gcc_compile( o, c ) end
end
TEST_OBJECTS.zip( TEST_SOURCES ).each do |o,c|
file o => [c] + headers(c) do |t| gcc_compile( o, c ) end
end
desc "Remove all build targets, binaries and temporary files"
task :clean do
sh "rm -rf *~ build"
end
namespace :pkg do
deb do |t|
t.code_files = ALL_SOURCES + ["Rakefile", "README.txt", "README.proxy.txt"]
t.pkg_name = "flexnbd"
t.generate_changelog!
end
end

2742
debian/changelog vendored

File diff suppressed because it is too large Load Diff

10
debian/control vendored
View File

@@ -1,14 +1,14 @@
Source: flexnbd
Section: unknown
Section: web
Priority: extra
Maintainer: Alex Young <alex@bytemark.co.uk>
Build-Depends: cdbs, debhelper (>= 7.0.50), ruby, rake, gcc, libev-dev
Maintainer: Patrick J Cherry <patrick@bytemark.co.uk>
Build-Depends: debhelper (>= 7.0.50), ruby, gcc, libev-dev, txt2man, check, net-tools, libsubunit-dev, ruby-test-unit
Standards-Version: 3.8.1
Homepage: http://bigv.io/
Homepage: https://github.com/BytemarkHosting/flexnbd-c
Package: flexnbd
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}, libev3
Depends: ${shlibs:Depends}, ${misc:Depends}, libev4 | libev3
Description: FlexNBD server
An NBD server offering push-mirroring and intelligent sparse file handling

View File

@@ -1,5 +1,3 @@
build/flexnbd usr/bin
build/flexnbd-proxy usr/bin
build/flexnbd.1.gz usr/share/man/man1
build/flexnbd-proxy.1.gz usr/share/man/man1

2
debian/flexnbd.manpages vendored Normal file
View File

@@ -0,0 +1,2 @@
build/flexnbd.1.gz
build/flexnbd-proxy.1.gz

15
debian/rules vendored
View File

@@ -7,12 +7,13 @@
%:
dh $@
override_dh_auto_build:
rake build
override_dh_auto_clean:
rake clean
.PHONY: override_dh_strip
override_dh_strip:
dh_strip --dbg-package=flexnbd-dbg
#
# TODO: The ruby test suites don't work during buiding in a chroot, so leave
# them out for now.
#
#override_dh_auto_test:
# rake test:run

108
src/acl.c
View File

@@ -1,108 +0,0 @@
#include <stdlib.h>
#include "util.h"
#include "parse.h"
#include "acl.h"
struct acl * acl_create( int len, char ** lines, int default_deny )
{
struct acl * acl;
acl = (struct acl *)xmalloc( sizeof( struct acl ) );
acl->len = parse_acl( &acl->entries, len, lines );
acl->default_deny = default_deny;
return acl;
}
static int testmasks[9] = { 0,128,192,224,240,248,252,254,255 };
/** Test whether AF_INET or AF_INET6 sockaddr is included in the given access
* control list, returning 1 if it is, and 0 if not.
*/
static int is_included_in_acl(int list_length, struct ip_and_mask (*list)[], union mysockaddr* test)
{
NULLCHECK( test );
int i;
for (i=0; i < list_length; i++) {
struct ip_and_mask *entry = &(*list)[i];
int testbits;
unsigned char *raw_address1, *raw_address2;
debug("checking acl entry %d (%d/%d)", i, test->generic.sa_family, entry->ip.family);
if (test->generic.sa_family != entry->ip.family) {
continue;
}
if (test->generic.sa_family == AF_INET) {
debug("it's an AF_INET");
raw_address1 = (unsigned char*) &test->v4.sin_addr;
raw_address2 = (unsigned char*) &entry->ip.v4.sin_addr;
}
else if (test->generic.sa_family == AF_INET6) {
debug("it's an AF_INET6");
raw_address1 = (unsigned char*) &test->v6.sin6_addr;
raw_address2 = (unsigned char*) &entry->ip.v6.sin6_addr;
}
else {
fatal( "Can't check an ACL for this address type." );
}
debug("testbits=%d", entry->mask);
for (testbits = entry->mask; testbits > 0; testbits -= 8) {
debug("testbits=%d, c1=%02x, c2=%02x", testbits, raw_address1[0], raw_address2[0]);
if (testbits >= 8) {
if (raw_address1[0] != raw_address2[0]) { goto no_match; }
}
else {
if ((raw_address1[0] & testmasks[testbits%8]) !=
(raw_address2[0] & testmasks[testbits%8]) ) {
goto no_match;
}
}
raw_address1++;
raw_address2++;
}
return 1;
no_match: ;
debug("no match");
}
return 0;
}
int acl_includes( struct acl * acl, union mysockaddr * addr )
{
NULLCHECK( acl );
if ( 0 == acl->len ) {
return !( acl->default_deny );
}
else {
return is_included_in_acl( acl->len, acl->entries, addr );
}
}
int acl_default_deny( struct acl * acl )
{
NULLCHECK( acl );
return acl->default_deny;
}
void acl_destroy( struct acl * acl )
{
free( acl->entries );
acl->len = 0;
acl->entries = NULL;
free( acl );
}

View File

@@ -1,415 +0,0 @@
#ifndef BITSET_H
#define BITSET_H
#include "util.h"
#include <inttypes.h>
#include <string.h>
#include <pthread.h>
static inline char char_with_bit_set(uint64_t num) { return 1<<(num%8); }
/** Return 1 if the bit at ''idx'' in array ''b'' is set */
static inline int bit_is_set(char* b, uint64_t idx) {
return (b[idx/8] & char_with_bit_set(idx)) != 0;
}
/** Return 1 if the bit at ''idx'' in array ''b'' is clear */
static inline int bit_is_clear(char* b, uint64_t idx) {
return !bit_is_set(b, idx);
}
/** Tests whether the bit at ''idx'' in array ''b'' has value ''value'' */
static inline int bit_has_value(char* b, uint64_t idx, int value) {
if (value) { return bit_is_set(b, idx); }
else { return bit_is_clear(b, idx); }
}
/** Sets the bit ''idx'' in array ''b'' */
static inline void bit_set(char* b, uint64_t idx) {
b[idx/8] |= char_with_bit_set(idx);
//__sync_fetch_and_or(b+(idx/8), char_with_bit_set(idx));
}
/** Clears the bit ''idx'' in array ''b'' */
static inline void bit_clear(char* b, uint64_t idx) {
b[idx/8] &= ~char_with_bit_set(idx);
//__sync_fetch_and_nand(b+(idx/8), char_with_bit_set(idx));
}
/** Sets ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_set_range(char* b, uint64_t from, uint64_t len)
{
for ( ; from%8 != 0 && len > 0 ; len-- ) {
bit_set( b, from++ );
}
if (len >= 8) {
memset(b+(from/8), 255, len/8 );
from += len;
len = (len%8);
from -= len;
}
for ( ; len > 0 ; len-- ) {
bit_set( b, from++ );
}
}
/** Clears ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_clear_range(char* b, uint64_t from, uint64_t len)
{
for ( ; from%8 != 0 && len > 0 ; len-- ) {
bit_clear( b, from++ );
}
if (len >= 8) {
memset(b+(from/8), 0, len/8 );
from += len;
len = (len%8);
from -= len;
}
for ( ; len > 0 ; len-- ) {
bit_clear( b, from++ );
}
}
/** Counts the number of contiguous bits in array ''b'', starting at ''from''
* up to a maximum number of bits ''len''. Returns the number of contiguous
* bits that are the same as the first one specified. If ''run_is_set'' is
* non-NULL, the value of that bit is placed into it.
*/
static inline uint64_t bit_run_count(char* b, uint64_t from, uint64_t len, int *run_is_set) {
uint64_t* current_block;
uint64_t count = 0;
int first_value = bit_is_set(b, from);
if ( run_is_set != NULL ) {
*run_is_set = first_value;
}
for ( ; (from+count) % 64 != 0 && len > 0; len--) {
if (bit_has_value(b, from+count, first_value)) {
count++;
} else {
return count;
}
}
for ( ; len >= 64 ; len -= 64 ) {
current_block = (uint64_t*) (b + ((from+count)/8));
if (*current_block == ( first_value ? UINT64_MAX : 0 ) ) {
count += 64;
} else {
break;
}
}
for ( ; len > 0; len-- ) {
if ( bit_has_value(b, from+count, first_value) ) {
count++;
}
}
return count;
}
enum bitset_stream_events {
BITSET_STREAM_UNSET = 0,
BITSET_STREAM_SET = 1,
BITSET_STREAM_ON = 2,
BITSET_STREAM_OFF = 3
};
struct bitset_stream_entry {
enum bitset_stream_events event;
uint64_t from;
uint64_t len;
};
/** Limit the stream size to 1MB for now.
*
* If this is too small, it'll cause requests to stall as the migration lags
* behind the changes made by those requests.
*/
#define BITSET_STREAM_SIZE ( ( 1024 * 1024 ) / sizeof( struct bitset_stream_entry ) )
struct bitset_stream {
struct bitset_stream_entry entries[BITSET_STREAM_SIZE];
int in;
int out;
int size;
pthread_mutex_t mutex;
pthread_cond_t cond_not_full;
pthread_cond_t cond_not_empty;
};
/** An application of a bitset - a bitset mapping represents a file of ''size''
* broken down into ''resolution''-sized chunks. The bit set is assumed to
* represent one bit per chunk. We also bundle a lock so that the set can be
* written reliably by multiple threads.
*/
struct bitset {
pthread_mutex_t lock;
uint64_t size;
int resolution;
struct bitset_stream *stream;
int stream_enabled;
char bits[];
};
/** Allocate a bitset for a file of the given size, and chunks of the
* given resolution.
*/
static inline struct bitset *bitset_alloc( uint64_t size, int resolution )
{
struct bitset *bitset = xmalloc(
sizeof( struct bitset ) + ( size + resolution - 1 ) / resolution
);
bitset->size = size;
bitset->resolution = resolution;
/* don't actually need to call pthread_mutex_destroy '*/
pthread_mutex_init(&bitset->lock, NULL);
bitset->stream = xmalloc( sizeof( struct bitset_stream ) );
pthread_mutex_init( &bitset->stream->mutex, NULL );
/* Technically don't need to call pthread_cond_destroy either */
pthread_cond_init( &bitset->stream->cond_not_full, NULL );
pthread_cond_init( &bitset->stream->cond_not_empty, NULL );
return bitset;
}
static inline void bitset_free( struct bitset * set )
{
/* TODO: free our mutex... */
free( set->stream );
set->stream = NULL;
free( set );
}
#define INT_FIRST_AND_LAST \
uint64_t first = from/set->resolution, \
last = ((from+len)-1)/set->resolution, \
bitlen = (last-first)+1
#define BITSET_LOCK \
FATAL_IF_NEGATIVE(pthread_mutex_lock(&set->lock), "Error locking bitset")
#define BITSET_UNLOCK \
FATAL_IF_NEGATIVE(pthread_mutex_unlock(&set->lock), "Error unlocking bitset")
static inline void bitset_stream_enqueue(
struct bitset * set,
enum bitset_stream_events event,
uint64_t from,
uint64_t len
)
{
struct bitset_stream * stream = set->stream;
pthread_mutex_lock( &stream->mutex );
while ( stream->size == BITSET_STREAM_SIZE ) {
pthread_cond_wait( &stream->cond_not_full, &stream->mutex );
}
stream->entries[stream->in].event = event;
stream->entries[stream->in].from = from;
stream->entries[stream->in].len = len;
stream->size++;
stream->in++;
stream->in %= BITSET_STREAM_SIZE;
pthread_mutex_unlock( & stream->mutex );
pthread_cond_broadcast( &stream->cond_not_empty );
return;
}
static inline void bitset_stream_dequeue(
struct bitset * set,
struct bitset_stream_entry * out
)
{
struct bitset_stream * stream = set->stream;
pthread_mutex_lock( &stream->mutex );
while ( stream->size == 0 ) {
pthread_cond_wait( &stream->cond_not_empty, &stream->mutex );
}
if ( out != NULL ) {
out->event = stream->entries[stream->out].event;
out->from = stream->entries[stream->out].from;
out->len = stream->entries[stream->out].len;
}
stream->size--;
stream->out++;
stream->out %= BITSET_STREAM_SIZE;
pthread_mutex_unlock( &stream->mutex );
pthread_cond_broadcast( &stream->cond_not_full );
return;
}
static inline size_t bitset_stream_size( struct bitset * set )
{
size_t size;
pthread_mutex_lock( &set->stream->mutex );
size = set->stream->size;
pthread_mutex_unlock( &set->stream->mutex );
return size;
}
static inline uint64_t bitset_stream_queued_bytes(
struct bitset * set,
enum bitset_stream_events event
)
{
uint64_t total = 0;
int i;
pthread_mutex_lock( &set->stream->mutex );
for ( i = set->stream->out; i < set->stream->in ; i++ ) {
if ( set->stream->entries[i].event == event ) {
total += set->stream->entries[i].len;
}
}
pthread_mutex_unlock( &set->stream->mutex );
return total;
}
static inline void bitset_enable_stream( struct bitset * set )
{
BITSET_LOCK;
set->stream_enabled = 1;
bitset_stream_enqueue( set, BITSET_STREAM_ON, 0, set->size );
BITSET_UNLOCK;
}
static inline void bitset_disable_stream( struct bitset * set )
{
BITSET_LOCK;
bitset_stream_enqueue( set, BITSET_STREAM_OFF, 0, set->size );
set->stream_enabled = 0;
BITSET_UNLOCK;
}
/** Set the bits in a bitset which correspond to the given bytes in the larger
* file.
*/
static inline void bitset_set_range(
struct bitset * set,
uint64_t from,
uint64_t len)
{
INT_FIRST_AND_LAST;
BITSET_LOCK;
bit_set_range(set->bits, first, bitlen);
if ( set->stream_enabled ) {
bitset_stream_enqueue( set, BITSET_STREAM_SET, from, len );
}
BITSET_UNLOCK;
}
/** Set every bit in the bitset. */
static inline void bitset_set( struct bitset * set )
{
bitset_set_range(set, 0, set->size);
}
/** Clear the bits in a bitset which correspond to the given bytes in the
* larger file.
*/
static inline void bitset_clear_range(
struct bitset * set,
uint64_t from,
uint64_t len)
{
INT_FIRST_AND_LAST;
BITSET_LOCK;
bit_clear_range(set->bits, first, bitlen);
if ( set->stream_enabled ) {
bitset_stream_enqueue( set, BITSET_STREAM_UNSET, from, len );
}
BITSET_UNLOCK;
}
/** Clear every bit in the bitset. */
static inline void bitset_clear( struct bitset * set )
{
bitset_clear_range(set, 0, set->size);
}
/** As per bitset_run_count but also tells you whether the run it found was set
* or unset, atomically.
*/
static inline uint64_t bitset_run_count_ex(
struct bitset * set,
uint64_t from,
uint64_t len,
int* run_is_set
)
{
uint64_t run;
/* Clip our requests to the end of the bitset, avoiding uint underflow. */
if ( from > set->size ) {
return 0;
}
len = ( len + from ) > set->size ? ( set->size - from ) : len;
INT_FIRST_AND_LAST;
BITSET_LOCK;
run = bit_run_count(set->bits, first, bitlen, run_is_set) * set->resolution;
run -= (from % set->resolution);
BITSET_UNLOCK;
return run;
}
/** Counts the number of contiguous bytes that are represented as a run in
* the bit field.
*/
static inline uint64_t bitset_run_count(
struct bitset * set,
uint64_t from,
uint64_t len)
{
return bitset_run_count_ex( set, from, len, NULL );
}
/** Tests whether the bit field is clear for the given file offset.
*/
static inline int bitset_is_clear_at( struct bitset * set, uint64_t at )
{
return bit_is_clear(set->bits, at/set->resolution);
}
/** Tests whether the bit field is set for the given file offset.
*/
static inline int bitset_is_set_at( struct bitset * set, uint64_t at )
{
return bit_is_set(set->bits, at/set->resolution);
}
#endif

View File

@@ -1,664 +0,0 @@
#include "client.h"
#include "serve.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "bitset.h"
#include "nbdtypes.h"
#include "self_pipe.h"
#include <sys/mman.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
struct client *client_create( struct server *serve, int socket )
{
NULLCHECK( serve );
struct client *c;
struct sigevent evp = {
.sigev_notify = SIGEV_SIGNAL,
.sigev_signo = CLIENT_KILLSWITCH_SIGNAL
};
c = xmalloc( sizeof( struct client ) );
c->stopped = 0;
c->socket = socket;
c->serve = serve;
c->stop_signal = self_pipe_create();
FATAL_IF_NEGATIVE(
timer_create( CLOCK_MONOTONIC, &evp, &(c->killswitch) ),
SHOW_ERRNO( "Failed to create killswitch timer" )
);
debug( "Alloced client %p with socket %d", c, socket );
return c;
}
void client_signal_stop( struct client *c)
{
NULLCHECK( c);
debug("client %p: signal stop (%d, %d)", c,c->stop_signal->read_fd, c->stop_signal->write_fd );
self_pipe_signal( c->stop_signal );
}
void client_destroy( struct client *client )
{
NULLCHECK( client );
FATAL_IF_NEGATIVE(
timer_delete( client->killswitch ),
SHOW_ERRNO( "Couldn't delete killswitch" )
);
debug( "Destroying stop signal for client %p", client );
self_pipe_destroy( client->stop_signal );
debug( "Freeing client %p", client );
free( client );
}
/**
* So waiting on client->socket is len bytes of data, and we must write it all
* to client->mapped. However while doing do we must consult the bitmap
* client->serve->allocation_map, which is a bitmap where one bit represents
* block_allocation_resolution bytes. Where a bit isn't set, there are no
* disc blocks allocated for that portion of the file, and we'd like to keep
* it that way.
*
* If the bitmap shows that every block in our prospective write is already
* allocated, we can proceed as normal and make one call to writeloop.
*
*/
void write_not_zeroes(struct client* client, uint64_t from, uint64_t len)
{
NULLCHECK( client );
NULLCHECK( client->serve );
NULLCHECK( client->serve->allocation_map );
struct bitset * map = client->serve->allocation_map;
while (len > 0) {
/* so we have to calculate how much of our input to consider
* next based on the bitmap of allocated blocks. This will be
* at a coarser resolution than the actual write, which may
* not fall on a block boundary at either end. So we look up
* how many blocks our write covers, then cut off the start
* and end to get the exact number of bytes.
*/
uint64_t run = bitset_run_count(map, from, len);
debug("write_not_zeroes: from=%ld, len=%d, run=%d", from, len, run);
if (run > len) {
run = len;
debug("(run adjusted to %d)", run);
}
if (0) /* useful but expensive */
{
uint64_t i;
fprintf(stderr, "full map resolution=%d: ", map->resolution);
for (i=0; i<client->serve->size; i+=map->resolution) {
int here = (from >= i && from < i+map->resolution);
if (here) { fprintf(stderr, ">"); }
fprintf(stderr, bitset_is_set_at(map, i) ? "1" : "0");
if (here) { fprintf(stderr, "<"); }
}
fprintf(stderr, "\n");
}
#define DO_READ(dst, len) ERROR_IF_NEGATIVE( \
readloop( \
client->socket, \
(dst), \
(len) \
), \
"read failed %ld+%d", from, (len) \
)
if (bitset_is_set_at(map, from)) {
debug("writing the lot: from=%ld, run=%d", from, run);
/* already allocated, just write it all */
DO_READ(client->mapped + from, run);
/* We know from our earlier call to bitset_run_count that the
* bitset is all-1s at this point, but we need to dirty it for the
* sake of the event stream - the actual bytes have changed, and we
* are interested in that fact.
*/
bitset_set_range( map, from, run );
len -= run;
from += run;
}
else {
char zerobuffer[block_allocation_resolution];
/* not allocated, read in block_allocation_resoution */
while (run > 0) {
uint64_t blockrun = block_allocation_resolution -
(from % block_allocation_resolution);
if (blockrun > run)
blockrun = run;
DO_READ(zerobuffer, blockrun);
/* This reads the buffer twice in the worst case
* but we're leaning on memcmp failing early
* and memcpy being fast, rather than try to
* hand-optimized something specific.
*/
int all_zeros = (zerobuffer[0] == 0) &&
(0 == memcmp( zerobuffer, zerobuffer+1, blockrun-1 ));
if ( !all_zeros ) {
memcpy(client->mapped+from, zerobuffer, blockrun);
bitset_set_range(map, from, blockrun);
/* at this point we could choose to
* short-cut the rest of the write for
* faster I/O but by continuing to do it
* the slow way we preserve as much
* sparseness as possible.
*/
}
/* When the block is all_zeroes, no bytes have changed, so we
* don't need to put an event into the bitset stream. This may
* be surprising in the future.
*/
len -= blockrun;
run -= blockrun;
from += blockrun;
}
}
}
}
int fd_read_request( int fd, struct nbd_request_raw *out_request)
{
return readloop(fd, out_request, sizeof(struct nbd_request_raw));
}
/* Returns 1 if *request was filled with a valid request which we should
* try to honour. 0 otherwise. */
int client_read_request( struct client * client , struct nbd_request *out_request, int * disconnected )
{
NULLCHECK( client );
NULLCHECK( out_request );
struct nbd_request_raw request_raw;
fd_set fds;
struct timeval * ptv = NULL;
int fd_count;
/* We want a timeout if this is an inbound migration, but not otherwise.
* This is compile-time selectable, as it will break mirror max_bps
*/
#ifdef HAS_LISTEN_TIMEOUT
struct timeval tv = {CLIENT_MAX_WAIT_SECS, 0};
if ( !server_is_in_control( client->serve ) ) {
ptv = &tv;
}
#endif
FD_ZERO(&fds);
FD_SET(client->socket, &fds);
self_pipe_fd_set( client->stop_signal, &fds );
fd_count = sock_try_select(FD_SETSIZE, &fds, NULL, NULL, ptv);
if ( fd_count == 0 ) {
/* This "can't ever happen" */
if ( NULL == ptv ) { fatal( "No FDs selected, and no timeout!" ); }
else { error("Timed out waiting for I/O"); }
}
else if ( fd_count < 0 ) { fatal( "Select failed" ); }
if ( self_pipe_fd_isset( client->stop_signal, &fds ) ){
debug("Client received stop signal.");
return 0;
}
if (fd_read_request(client->socket, &request_raw) == -1) {
*disconnected = 1;
switch( errno ){
case 0:
debug( "EOF while reading request" );
return 0;
case ECONNRESET:
debug( "Connection reset while"
" reading request" );
return 0;
default:
/* FIXME: I've seen this happen, but I
* couldn't reproduce it so I'm leaving
* it here with a better debug output in
* the hope it'll spontaneously happen
* again. It should *probably* be an
* error() call, but I want to be sure.
* */
fatal("Error reading request: %d, %s",
errno,
strerror( errno ));
}
}
nbd_r2h_request( &request_raw, out_request );
return 1;
}
int fd_write_reply( int fd, char *handle, int error )
{
struct nbd_reply reply;
struct nbd_reply_raw reply_raw;
reply.magic = REPLY_MAGIC;
reply.error = error;
memcpy( reply.handle, handle, 8 );
nbd_h2r_reply( &reply, &reply_raw );
debug( "Replying with %s, %d", handle, error );
if( -1 == writeloop( fd, &reply_raw, sizeof( reply_raw ) ) ) {
switch( errno ) {
case ECONNRESET:
error( "Connection reset while writing reply" );
break;
case EBADF:
fatal( "Tried to write to an invalid file descriptor" );
break;
case EPIPE:
error( "Remote end closed" );
break;
default:
fatal( "Unhandled error while writing: %d", errno );
}
}
return 1;
}
/* Writes a reply to request *request, with error, to the client's
* socket.
* Returns 1; we don't check for errors on the write.
* TODO: Check for errors on the write.
*/
int client_write_reply( struct client * client, struct nbd_request *request, int error )
{
return fd_write_reply( client->socket, request->handle, error);
}
void client_write_init( struct client * client, uint64_t size )
{
struct nbd_init init = {{0}};
struct nbd_init_raw init_raw = {{0}};
memcpy( init.passwd, INIT_PASSWD, sizeof( INIT_PASSWD ) );
init.magic = INIT_MAGIC;
init.size = size;
memset( init.reserved, 0, 128 );
nbd_h2r_init( &init, &init_raw );
ERROR_IF_NEGATIVE(
writeloop(client->socket, &init_raw, sizeof(init_raw)),
"Couldn't send hello"
);
}
/* Remove len bytes from the client socket. This is needed when the
* client sends a write we can't honour - we need to get rid of the
* bytes they've already written before we can look for another request.
*/
void client_flush( struct client * client, size_t len )
{
int devnull = open("/dev/null", O_WRONLY);
FATAL_IF_NEGATIVE( devnull,
"Couldn't open /dev/null: %s", strerror(errno));
int pipes[2];
pipe( pipes );
const unsigned int flags = SPLICE_F_MORE | SPLICE_F_MOVE;
size_t spliced = 0;
while ( spliced < len ) {
ssize_t received = splice(
client->socket, NULL,
pipes[1], NULL,
len-spliced, flags );
FATAL_IF_NEGATIVE( received,
"splice error: %s",
strerror(errno));
ssize_t junked = 0;
while( junked < received ) {
ssize_t junk;
junk = splice(
pipes[0], NULL,
devnull, NULL,
received, flags );
FATAL_IF_NEGATIVE( junk,
"splice error: %s",
strerror(errno));
junked += junk;
}
spliced += received;
}
debug("Flushed %d bytes", len);
close( devnull );
}
/* Check to see if the client's request needs a reply constructing.
* Returns 1 if we do, 0 otherwise.
* request_err is set to 0 if the client sent a bad request, in which
* case we drop the connection.
*/
int client_request_needs_reply( struct client * client,
struct nbd_request request )
{
/* The client is stupid, but don't take down the whole server as a result.
* We send a reply before disconnecting so that at least some indication of
* the problem is visible, and so proxies don't retry the same (bad) request
* forever.
*/
if (request.magic != REQUEST_MAGIC) {
warn("Bad magic 0x%08x from client", request.magic);
client_write_reply( client, &request, EBADMSG );
client->disconnect = 1; // no need to flush
return 0;
}
debug(
"request type=%"PRIu32", from=%"PRIu64", len=%"PRIu32,
request.type, request.from, request.len
);
/* check it's not out of range */
if ( request.from+request.len > client->serve->size) {
warn("write request %"PRIu64"+%"PRIu32" out of range",
request.from, request.len
);
if ( request.type == REQUEST_WRITE ) {
client_flush( client, request.len );
}
client_write_reply( client, &request, EPERM ); /* TODO: Change to ERANGE ? */
client->disconnect = 0;
return 0;
}
switch (request.type)
{
case REQUEST_READ:
break;
case REQUEST_WRITE:
break;
case REQUEST_DISCONNECT:
debug("request disconnect");
client->disconnect = 1;
return 0;
default:
fatal("Unknown request %08x", request.type);
}
return 1;
}
void client_reply_to_read( struct client* client, struct nbd_request request )
{
off64_t offset;
debug("request read %ld+%d", request.from, request.len);
client_write_reply( client, &request, 0);
offset = request.from;
/* If we get cut off partway through this sendfile, we don't
* want to kill the server. This should be an error.
*/
ERROR_IF_NEGATIVE(
sendfileloop(
client->socket,
client->fileno,
&offset,
request.len),
"sendfile failed from=%ld, len=%d",
offset,
request.len);
}
void client_reply_to_write( struct client* client, struct nbd_request request )
{
debug("request write %ld+%d", request.from, request.len);
if (client->serve->allocation_map_built) {
write_not_zeroes( client, request.from, request.len );
}
else {
debug("No allocation map, writing directly.");
/* If we get cut off partway through reading this data:
* */
ERROR_IF_NEGATIVE(
readloop( client->socket,
client->mapped + request.from,
request.len),
"reading write data failed from=%ld, len=%d",
request.from,
request.len
);
/* the allocation_map is shared between client threads, and may be
* being built. We need to reflect the write in it, as it may be in
* a position the builder has already gone over.
*/
bitset_set_range(client->serve->allocation_map, request.from, request.len);
}
if (1) /* not sure whether this is necessary... */
{
/* multiple of 4K page size */
uint64_t from_rounded = request.from & (!0xfff);
uint64_t len_rounded = request.len + (request.from - from_rounded);
FATAL_IF_NEGATIVE(
msync( client->mapped + from_rounded,
len_rounded,
MS_SYNC | MS_INVALIDATE),
"msync failed %ld %ld", request.from, request.len
);
}
client_write_reply( client, &request, 0);
}
void client_reply( struct client* client, struct nbd_request request )
{
switch (request.type) {
case REQUEST_READ:
client_reply_to_read( client, request );
break;
case REQUEST_WRITE:
client_reply_to_write( client, request );
break;
}
}
/* Starts a timer that will kill the whole process if disarm is not called
* within a timeout (see CLIENT_HANDLE_TIMEOUT).
*/
void client_arm_killswitch( struct client* client )
{
struct itimerspec its = {
.it_value = { .tv_nsec = 0, .tv_sec = CLIENT_HANDLER_TIMEOUT },
.it_interval = { .tv_nsec = 0, .tv_sec = 0 }
};
if ( !client->serve->use_killswitch ) {
return;
}
debug( "Arming killswitch" );
FATAL_IF_NEGATIVE(
timer_settime( client->killswitch, 0, &its, NULL ),
SHOW_ERRNO( "Failed to arm killswitch" )
);
return;
}
void client_disarm_killswitch( struct client* client )
{
struct itimerspec its = {
.it_value = { .tv_nsec = 0, .tv_sec = 0 },
.it_interval = { .tv_nsec = 0, .tv_sec = 0 }
};
if ( !client->serve->use_killswitch ) {
return;
}
debug( "Disarming killswitch" );
FATAL_IF_NEGATIVE(
timer_settime( client->killswitch, 0, &its, NULL ),
SHOW_ERRNO( "Failed to disarm killswitch" )
);
return;
}
/* Returns 0 if we should continue trying to serve requests */
int client_serve_request(struct client* client)
{
struct nbd_request request = {0};
int stop = 1;
int disconnected = 0;
if ( !client_read_request( client, &request, &disconnected ) ) { return stop; }
if ( disconnected ) { return stop; }
if ( !client_request_needs_reply( client, request ) ) {
return client->disconnect;
}
{
if ( !server_is_closed( client->serve ) ) {
/* We arm / disarm around client_reply() to catch cases where the
* remote peer sends part of a write request data before dying,
* and cases where we send part of read reply data before they die.
*
* That last is theoretical right now, but could break us in the
* same way as a half-write (which causes us to sit in read forever)
*
* We only arm/disarm inside the server io lock because it's common
* during migrations for us to be hanging on that mutex for quite
* a while while the final pass happens - it's held for the entire
* time.
*/
client_arm_killswitch( client );
client_reply( client, request );
client_disarm_killswitch( client );
stop = 0;
}
}
return stop;
}
void client_send_hello(struct client* client)
{
client_write_init( client, client->serve->size );
}
void client_cleanup(struct client* client,
int fatal __attribute__ ((unused)) )
{
info("client cleanup for client %p", client);
if (client->socket) {
FATAL_IF_NEGATIVE( close(client->socket),
"Error closing client socket %d",
client->socket );
debug("Closed client socket fd %d", client->socket);
client->socket = -1;
}
if (client->mapped) {
munmap(client->mapped, client->serve->size);
}
if (client->fileno) {
FATAL_IF_NEGATIVE( close(client->fileno),
"Error closing file %d",
client->fileno );
debug("Closed client file fd %d", client->fileno );
client->fileno = -1;
}
if ( server_acl_locked( client->serve ) ) { server_unlock_acl( client->serve ); }
}
void* client_serve(void* client_uncast)
{
struct client* client = (struct client*) client_uncast;
error_set_handler((cleanup_handler*) client_cleanup, client);
info("client: mmaping file");
FATAL_IF_NEGATIVE(
open_and_mmap(
client->serve->filename,
&client->fileno,
NULL,
(void**) &client->mapped
),
"Couldn't open/mmap file %s: %s", client->serve->filename, strerror( errno )
);
FATAL_IF_NEGATIVE(
madvise( client->mapped, client->serve->size, MADV_RANDOM ),
SHOW_ERRNO( "Failed to madvise() %s", client->serve->filename )
);
debug( "Opened client file fd %d", client->fileno);
debug("client: sending hello");
client_send_hello(client);
debug("client: serving requests");
while (client_serve_request(client) == 0)
;
debug("client: stopped serving requests");
client->stopped = 1;
if ( client->disconnect ){
debug("client: control arrived" );
server_control_arrived( client->serve );
}
debug("Cleaning client %p up normally in thread %p", client, pthread_self());
client_cleanup(client, 0);
debug("Client thread done" );
return NULL;
}

View File

@@ -1,68 +0,0 @@
#ifndef CLIENT_H
#define CLIENT_H
#include <signal.h>
#include <time.h>
#ifdef HAS_LISTEN_TIMEOUT
/** CLIENT_MAX_WAIT_SECS
* This is the length of time an inbound migration will wait for a fresh
* write before assuming the source has Gone Away. Note: it is *not*
* the time from one write to the next, it is the gap between the end of
* one write and the start of the next.
*/
#define CLIENT_MAX_WAIT_SECS 5
#endif
/** CLIENT_HANDLER_TIMEOUT
* This is the length of time (in seconds) any request can be outstanding for.
* If we spend longer than this in a request, the whole server is killed.
*/
#define CLIENT_HANDLER_TIMEOUT 120
/** CLIENT_KILLSWITCH_SIGNAL
* The signal number we use to kill the server when *any* killswitch timer
* fires. We don't actually need to install a signal handler for it, the default
* behaviour is perfectly fine.
*/
#define CLIENT_KILLSWITCH_SIGNAL ( SIGRTMIN + 1 )
struct client {
/* When we call pthread_join, if the thread is already dead
* we can get an ESRCH. Since we have no other way to tell
* if that ESRCH is from a dead thread or a thread that never
* existed, we use a `stopped` flag to indicate a thread which
* did exist, but went away. Only check this after a
* pthread_join call.
*/
int stopped;
int socket;
int fileno;
char* mapped;
struct self_pipe * stop_signal;
struct server* serve; /* FIXME: remove above duplication */
/* Have we seen a REQUEST_DISCONNECT message? */
int disconnect;
/* kill the whole server if a request has been outstanding too long,
* assuming use_killswitch is set in serve
*/
timer_t killswitch;
};
void* client_serve(void* client_uncast);
struct client * client_create( struct server * serve, int socket );
void client_destroy( struct client * client );
void client_signal_stop( struct client * client );
#endif

371
src/common/ioutil.c Normal file
View File

@@ -0,0 +1,371 @@
#include <sys/mman.h>
#include <sys/sendfile.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <linux/fs.h>
#include <linux/fiemap.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include "util.h"
#include "bitset.h"
#include "ioutil.h"
int build_allocation_map(struct bitset *allocation_map, int fd)
{
/* break blocking ioctls down */
const unsigned long max_length = 100 * 1024 * 1024;
const unsigned int max_extents = 1000;
unsigned long offset = 0;
struct {
struct fiemap fiemap;
struct fiemap_extent extents[max_extents];
} fiemap_static;
struct fiemap *fiemap = (struct fiemap *) &fiemap_static;
memset(&fiemap_static, 0, sizeof(fiemap_static));
for (offset = 0; offset < allocation_map->size;) {
fiemap->fm_start = offset;
fiemap->fm_length = max_length;
if (offset + max_length > allocation_map->size) {
fiemap->fm_length = allocation_map->size - offset;
}
fiemap->fm_flags = FIEMAP_FLAG_SYNC;
fiemap->fm_extent_count = max_extents;
fiemap->fm_mapped_extents = 0;
if (ioctl(fd, FS_IOC_FIEMAP, fiemap) < 0) {
debug("Couldn't get fiemap, returning no allocation_map");
return 0; /* it's up to the caller to free the map */
} else {
for (unsigned int i = 0; i < fiemap->fm_mapped_extents; i++) {
bitset_set_range(allocation_map,
fiemap->fm_extents[i].fe_logical,
fiemap->fm_extents[i].fe_length);
}
/* must move the offset on, but careful not to jump max_length
* if we've actually hit max_offsets.
*/
if (fiemap->fm_mapped_extents > 0) {
struct fiemap_extent *last =
&fiemap->fm_extents[fiemap->fm_mapped_extents - 1];
offset = last->fe_logical + last->fe_length;
} else {
offset += fiemap->fm_length;
}
}
}
info("Successfully built allocation map");
return 1;
}
int open_and_mmap(const char *filename, int *out_fd, uint64_t * out_size,
void **out_map)
{
/*
* size and out_size are intentionally of different types.
* lseek64() uses off64_t to signal errors in the sign bit.
* Since we check for these errors before trying to assign to
* *out_size, we know *out_size can never go negative.
*/
off64_t size;
/* O_DIRECT should not be used with mmap() */
*out_fd = open(filename, O_RDWR | O_NOATIME);
if (*out_fd < 1) {
warn("open(%s) failed: does it exist?", filename);
return *out_fd;
}
size = lseek64(*out_fd, 0, SEEK_END);
if (size < 0) {
warn("lseek64() failed");
return size;
}
/* If discs are not in multiples of 512, then odd things happen,
* resulting in reads/writes past the ends of files.
*/
if (size != (size & (~0x1ff))) {
warn("file does not fit into 512-byte sectors; the end of the file will be ignored.");
size &= ~0x1ff;
}
if (out_size) {
*out_size = size;
}
if (out_map) {
*out_map = mmap64(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
*out_fd, 0);
if (((long) *out_map) == -1) {
warn("mmap64() failed");
return -1;
}
debug("opened %s size %ld on fd %d @ %p", filename, size, *out_fd,
*out_map);
} else {
debug("opened %s size %ld on fd %d", filename, size, *out_fd);
}
return 0;
}
int writeloop(int filedes, const void *buffer, size_t size)
{
size_t written = 0;
while (written < size) {
ssize_t result = write(filedes, buffer + written, size - written);
if (result == -1) {
if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) {
continue; // busy-wait
}
return -1; // failure
}
written += result;
}
return 0;
}
int readloop(int filedes, void *buffer, size_t size)
{
size_t readden = 0;
while (readden < size) {
ssize_t result = read(filedes, buffer + readden, size - readden);
if (result == 0 /* EOF */ ) {
return -1;
}
if (result == -1) {
if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) {
continue; // busy-wait
}
return -1; // failure
}
readden += result;
}
return 0;
}
int sendfileloop(int out_fd, int in_fd, off64_t * offset, size_t count)
{
size_t sent = 0;
while (sent < count) {
ssize_t result = sendfile64(out_fd, in_fd, offset, count - sent);
debug
("sendfile64(out_fd=%d, in_fd=%d, offset=%p, count-sent=%ld) = %ld",
out_fd, in_fd, offset, count - sent, result);
if (result == -1) {
debug("%s (%i) calling sendfile64()", strerror(errno), errno);
return -1;
}
sent += result;
debug("sent=%ld, count=%ld", sent, count);
}
debug("exiting sendfileloop");
return 0;
}
#include <errno.h>
ssize_t spliceloop(int fd_in, loff_t * off_in, int fd_out,
loff_t * off_out, size_t len, unsigned int flags2)
{
const unsigned int flags = SPLICE_F_MORE | SPLICE_F_MOVE | flags2;
size_t spliced = 0;
//debug("spliceloop(%d, %ld, %d, %ld, %ld)", fd_in, off_in ? *off_in : 0, fd_out, off_out ? *off_out : 0, len);
while (spliced < len) {
ssize_t result =
splice(fd_in, off_in, fd_out, off_out, len, flags);
if (result < 0) {
//debug("result=%ld (%s), spliced=%ld, len=%ld", result, strerror(errno), spliced, len);
if (errno == EAGAIN && (flags & SPLICE_F_NONBLOCK)) {
return spliced;
} else {
return -1;
}
} else {
spliced += result;
//debug("result=%ld (%s), spliced=%ld, len=%ld", result, strerror(errno), spliced, len);
}
}
return spliced;
}
int splice_via_pipe_loop(int fd_in, int fd_out, size_t len)
{
int pipefd[2]; /* read end, write end */
size_t spliced = 0;
if (pipe(pipefd) == -1) {
return -1;
}
while (spliced < len) {
ssize_t run = len - spliced;
ssize_t s2, s1 = spliceloop(fd_in, NULL, pipefd[1], NULL, run,
SPLICE_F_NONBLOCK);
/*if (run > 65535)
run = 65535; */
if (s1 < 0) {
break;
}
s2 = spliceloop(pipefd[0], NULL, fd_out, NULL, s1, 0);
if (s2 < 0) {
break;
}
spliced += s2;
}
close(pipefd[0]);
close(pipefd[1]);
return spliced < len ? -1 : 0;
}
/* Reads single bytes from fd until either an EOF or a newline appears.
* If an EOF occurs before a newline, returns -1. The line is lost.
* Inserts the read bytes (without the newline) into buf, followed by a
* trailing NULL.
* Returns the number of read bytes: the length of the line without the
* newline, plus the trailing null.
*/
int read_until_newline(int fd, char *buf, int bufsize)
{
int cur;
for (cur = 0; cur < bufsize; cur++) {
int result = read(fd, buf + cur, 1);
if (result <= 0) {
return -1;
}
if (buf[cur] == 10) {
buf[cur] = '\0';
break;
}
}
return cur + 1;
}
int read_lines_until_blankline(int fd, int max_line_length, char ***lines)
{
int lines_count = 0;
char line[max_line_length + 1];
*lines = NULL;
memset(line, 0, max_line_length + 1);
while (1) {
int readden = read_until_newline(fd, line, max_line_length);
/* readden will be:
* 1 for an empty line
* -1 for an eof
* -1 for a read error
*/
if (readden <= 1) {
return lines_count;
}
*lines = xrealloc(*lines, (lines_count + 1) * sizeof(char *));
(*lines)[lines_count] = strdup(line);
if ((*lines)[lines_count][0] == 0) {
return lines_count;
}
lines_count++;
}
}
int fd_is_closed(int fd_in)
{
int errno_old = errno;
int result = fcntl(fd_in, F_GETFL) < 0;
errno = errno_old;
return result;
}
static inline int io_errno_permanent(void)
{
return (errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR);
}
/* Returns -1 if the operation failed, or the number of bytes read if all is
* well. Note that 0 bytes may be returned. Unlike read(), this is not an EOF! */
ssize_t iobuf_read(int fd, struct iobuf * iobuf, size_t default_size)
{
size_t left;
ssize_t count;
if (iobuf->needle == 0) {
iobuf->size = default_size;
}
left = iobuf->size - iobuf->needle;
debug("Reading %" PRIu32 " of %" PRIu32 " bytes from fd %i", left,
iobuf->size, fd);
count = read(fd, iobuf->buf + iobuf->needle, left);
if (count > 0) {
iobuf->needle += count;
debug("read() returned %" PRIu32 " bytes", count);
} else if (count == 0) {
warn("read() returned EOF on fd %i", fd);
errno = 0;
return -1;
} else if (count == -1) {
if (io_errno_permanent()) {
warn(SHOW_ERRNO("read() failed on fd %i", fd));
} else {
debug(SHOW_ERRNO("read() returned 0 bytes"));
count = 0;
}
}
return count;
}
ssize_t iobuf_write(int fd, struct iobuf * iobuf)
{
size_t left = iobuf->size - iobuf->needle;
ssize_t count;
debug("Writing %" PRIu32 " of %" PRIu32 " bytes to fd %i", left,
iobuf->size, fd);
count = write(fd, iobuf->buf + iobuf->needle, left);
if (count >= 0) {
iobuf->needle += count;
debug("write() returned %" PRIu32 " bytes", count);
} else {
if (io_errno_permanent()) {
warn(SHOW_ERRNO("write() failed on fd %i", fd));
} else {
debug(SHOW_ERRNO("write() returned 0 bytes"));
count = 0;
}
}
return count;
}

View File

@@ -3,16 +3,16 @@
#include <sys/types.h>
struct iobuf {
unsigned char *buf;
size_t size;
size_t needle;
unsigned char *buf;
size_t size;
size_t needle;
};
ssize_t iobuf_read( int fd, struct iobuf* iobuf, size_t default_size );
ssize_t iobuf_write( int fd, struct iobuf* iobuf );
ssize_t iobuf_read(int fd, struct iobuf *iobuf, size_t default_size);
ssize_t iobuf_write(int fd, struct iobuf *iobuf);
#include "serve.h"
struct bitset; /* don't need whole of bitset.h here */
struct bitset; /* don't need whole of bitset.h here */
/** Scan the file opened in ''fd'', set bits in ''allocation_map'' that
* correspond to which blocks are physically allocated on disc (or part-
@@ -20,7 +20,7 @@ struct bitset; /* don't need whole of bitset.h here */
* than you've asked for, any block or part block will count as "allocated"
* with the corresponding bit set. Returns 1 if successful, 0 otherwise.
*/
int build_allocation_map(struct bitset * allocation_map, int fd);
int build_allocation_map(struct bitset *allocation_map, int fd);
/** Repeat a write() operation that succeeds partially until ''size'' bytes
* are written, or an error is returned, when it returns -1 as usual.
@@ -35,10 +35,11 @@ int readloop(int filedes, void *buffer, size_t size);
/** Repeat a sendfile() operation that succeeds partially until ''size'' bytes
* are written, or an error is returned, when it returns -1 as usual.
*/
int sendfileloop(int out_fd, int in_fd, off64_t *offset, size_t count);
int sendfileloop(int out_fd, int in_fd, off64_t * offset, size_t count);
/** Repeat a splice() operation until we have 'len' bytes. */
ssize_t spliceloop(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags2);
ssize_t spliceloop(int fd_in, loff_t * off_in, int fd_out,
loff_t * off_out, size_t len, unsigned int flags2);
/** Copy ''len'' bytes from ''fd_in'' to ''fd_out'' by creating a temporary
* pipe and using the Linux splice call repeatedly until it has transferred
@@ -50,7 +51,7 @@ int splice_via_pipe_loop(int fd_in, int fd_out, size_t len);
* until an LF character is received, which is written to the buffer at a zero
* byte. Returns -1 on error, or the number of bytes written to the buffer.
*/
int read_until_newline(int fd, char* buf, int bufsize);
int read_until_newline(int fd, char *buf, int bufsize);
/** Read a number of lines using read_until_newline, until an empty line is
* received (i.e. the sequence LF LF). The data is read from ''fd'' and
@@ -65,12 +66,12 @@ int read_lines_until_blankline(int fd, int max_line_length, char ***lines);
* ''out_size'' and the address of the mmap in ''out_map''. If anything goes
* wrong, returns -1 setting errno, otherwise 0.
*/
int open_and_mmap( const char* filename, int* out_fd, off64_t *out_size, void **out_map);
int open_and_mmap(const char *filename, int *out_fd, uint64_t * out_size,
void **out_map);
/** Check to see whether the given file descriptor is closed.
*/
int fd_is_closed( int fd_in );
int fd_is_closed(int fd_in);
#endif

View File

@@ -2,13 +2,14 @@
#define MODE_H
void mode(char* mode, int argc, char **argv);
void mode(char *mode, int argc, char **argv);
#include <getopt.h>
#define GETOPT_ARG(x,s) {(x), 1, 0, (s)}
#define GETOPT_FLAG(x,v) {(x), 0, 0, (v)}
#define GETOPT_ARG(x,s) {(x), required_argument, 0, (s)}
#define GETOPT_FLAG(x,v) {(x), no_argument, 0, (v)}
#define GETOPT_OPTARG(x,s) {(x), optional_argument, 0, (s)}
#define OPT_HELP "help"
#define OPT_ADDR "addr"
@@ -19,6 +20,7 @@ void mode(char* mode, int argc, char **argv);
#define OPT_FROM "from"
#define OPT_SIZE "size"
#define OPT_DENY "default-deny"
#define OPT_CACHE "cache"
#define OPT_UNLINK "unlink"
#define OPT_CONNECT_ADDR "conn-addr"
#define OPT_CONNECT_PORT "conn-port"
@@ -52,6 +54,7 @@ void mode(char* mode, int argc, char **argv);
#define GETOPT_FROM GETOPT_ARG( OPT_FROM, 'F' )
#define GETOPT_SIZE GETOPT_ARG( OPT_SIZE, 'S' )
#define GETOPT_BIND GETOPT_ARG( OPT_BIND, 'b' )
#define GETOPT_CACHE GETOPT_OPTARG( OPT_CACHE, 'c' )
#define GETOPT_UNLINK GETOPT_ARG( OPT_UNLINK, 'u' )
#define GETOPT_CONNECT_ADDR GETOPT_ARG( OPT_CONNECT_ADDR, 'C' )
#define GETOPT_CONNECT_PORT GETOPT_ARG( OPT_CONNECT_PORT, 'P' )
@@ -65,9 +68,9 @@ void mode(char* mode, int argc, char **argv);
"\t--" OPT_VERBOSE ",-" SOPT_VERBOSE "\t\tOutput debug information.\n"
#ifdef DEBUG
# define VERBOSE_LOG_LEVEL 0
#define VERBOSE_LOG_LEVEL 0
#else
# define VERBOSE_LOG_LEVEL 1
#define VERBOSE_LOG_LEVEL 1
#endif
#define QUIET_LOG_LEVEL 4
@@ -88,7 +91,6 @@ void mode(char* mode, int argc, char **argv);
#define MAX_SPEED_LINE \
"\t--" OPT_MAX_SPEED ",-m <bps>\tMaximum speed of the migration, in bytes/sec.\n"
char * help_help_text;
char *help_help_text;
#endif

61
src/common/nbdtypes.c Normal file
View File

@@ -0,0 +1,61 @@
#include "nbdtypes.h"
#include <string.h>
#include <endian.h>
/**
* We intentionally ignore the reserved 128 bytes at the end of the
* request, since there's nothing we can do with them.
*/
void nbd_r2h_init(struct nbd_init_raw *from, struct nbd_init *to)
{
memcpy(to->passwd, from->passwd, 8);
to->magic = be64toh(from->magic);
to->size = be64toh(from->size);
to->flags = be32toh(from->flags);
}
void nbd_h2r_init(struct nbd_init *from, struct nbd_init_raw *to)
{
memcpy(to->passwd, from->passwd, 8);
to->magic = htobe64(from->magic);
to->size = htobe64(from->size);
to->flags = htobe32(from->flags);
}
void nbd_r2h_request(struct nbd_request_raw *from, struct nbd_request *to)
{
to->magic = be32toh(from->magic);
to->flags = be16toh(from->flags);
to->type = be16toh(from->type);
to->handle.w = from->handle.w;
to->from = be64toh(from->from);
to->len = be32toh(from->len);
}
void nbd_h2r_request(struct nbd_request *from, struct nbd_request_raw *to)
{
to->magic = htobe32(from->magic);
to->flags = htobe16(from->flags);
to->type = htobe16(from->type);
to->handle.w = from->handle.w;
to->from = htobe64(from->from);
to->len = htobe32(from->len);
}
void nbd_r2h_reply(struct nbd_reply_raw *from, struct nbd_reply *to)
{
to->magic = be32toh(from->magic);
to->error = be32toh(from->error);
to->handle.w = from->handle.w;
}
void nbd_h2r_reply(struct nbd_reply *from, struct nbd_reply_raw *to)
{
to->magic = htobe32(from->magic);
to->error = htobe32(from->error);
to->handle.w = from->handle.w;
}

114
src/common/nbdtypes.h Normal file
View File

@@ -0,0 +1,114 @@
#ifndef __NBDTYPES_H
#define __NBDTYPES_H
/* http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-09/2332.html */
#define INIT_PASSWD "NBDMAGIC"
#define INIT_MAGIC 0x0000420281861253
#define REQUEST_MAGIC 0x25609513
#define REPLY_MAGIC 0x67446698
#define REQUEST_READ 0
#define REQUEST_WRITE 1
#define REQUEST_DISCONNECT 2
#define REQUEST_FLUSH 3
/* values for transmission flag field */
#define FLAG_HAS_FLAGS (1 << 0) /* Flags are there */
#define FLAG_SEND_FLUSH (1 << 2) /* Send FLUSH */
#define FLAG_SEND_FUA (1 << 3) /* Send FUA (Force Unit Access) */
/* values for command flag field */
#define CMD_FLAG_FUA (1 << 0)
#if 0
/* Not yet implemented by flexnbd */
#define REQUEST_TRIM 4
#define REQUEST_WRITE_ZEROES 6
#define FLAG_READ_ONLY (1 << 1) /* Device is read-only */
#define FLAG_ROTATIONAL (1 << 4) /* Use elevator algorithm - rotational media */
#define FLAG_SEND_TRIM (1 << 5) /* Send TRIM (discard) */
#define FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send NBD_CMD_WRITE_ZEROES */
#define FLAG_CAN_MULTI_CONN (1 << 8) /* multiple connections are okay */
#define CMD_FLAG_NO_HOLE (1 << 1)
#endif
/* 32 MiB is the maximum qemu will send you:
* https://github.com/qemu/qemu/blob/v2.11.0/include/block/nbd.h#L183
*/
#define NBD_MAX_SIZE ( 32 * 1024 * 1024 )
#define NBD_REQUEST_SIZE ( sizeof( struct nbd_request_raw ) )
#define NBD_REPLY_SIZE ( sizeof( struct nbd_reply_raw ) )
#include <linux/types.h>
#include <inttypes.h>
typedef union nbd_handle_t {
uint8_t b[8];
uint64_t w;
} nbd_handle_t;
/* The _raw types are the types as they appear on the wire. Non-_raw
* types are in host-format.
* Conversion functions are _r2h_ for converting raw to host, and _h2r_
* for converting host to raw.
*/
struct nbd_init_raw {
char passwd[8];
__be64 magic;
__be64 size;
__be32 flags;
char reserved[124];
};
struct nbd_request_raw {
__be32 magic;
__be16 flags;
__be16 type; /* == READ || == WRITE || == FLUSH */
nbd_handle_t handle;
__be64 from;
__be32 len;
} __attribute__ ((packed));
struct nbd_reply_raw {
__be32 magic;
__be32 error; /* 0 = ok, else error */
nbd_handle_t handle; /* handle you got from request */
};
struct nbd_init {
char passwd[8];
uint64_t magic;
uint64_t size;
uint32_t flags;
char reserved[124];
};
struct nbd_request {
uint32_t magic;
uint16_t flags;
uint16_t type; /* == READ || == WRITE || == DISCONNECT || == FLUSH */
nbd_handle_t handle;
uint64_t from;
uint32_t len;
} __attribute__ ((packed));
struct nbd_reply {
uint32_t magic;
uint32_t error; /* 0 = ok, else error */
nbd_handle_t handle; /* handle you got from request */
};
void nbd_r2h_init(struct nbd_init_raw *from, struct nbd_init *to);
void nbd_r2h_request(struct nbd_request_raw *from, struct nbd_request *to);
void nbd_r2h_reply(struct nbd_reply_raw *from, struct nbd_reply *to);
void nbd_h2r_init(struct nbd_init *from, struct nbd_init_raw *to);
void nbd_h2r_request(struct nbd_request *from, struct nbd_request_raw *to);
void nbd_h2r_reply(struct nbd_reply *from, struct nbd_reply_raw *to);
#endif

125
src/common/parse.c Normal file
View File

@@ -0,0 +1,125 @@
#include "parse.h"
#include "util.h"
int atoi(const char *nptr);
#define IS_IP_VALID_CHAR(x) ( ((x) >= '0' && (x) <= '9' ) || \
((x) >= 'a' && (x) <= 'f') || \
((x) >= 'A' && (x) <= 'F' ) || \
(x) == ':' || (x) == '.' \
)
/* FIXME: should change this to return negative on error like everything else */
int parse_ip_to_sockaddr(struct sockaddr *out, char *src)
{
NULLCHECK(out);
NULLCHECK(src);
char temp[64];
struct sockaddr_in *v4 = (struct sockaddr_in *) out;
struct sockaddr_in6 *v6 = (struct sockaddr_in6 *) out;
/* allow user to start with [ and end with any other invalid char */
{
int i = 0, j = 0;
if (src[i] == '[') {
i++;
}
for (; i < 64 && IS_IP_VALID_CHAR(src[i]); i++) {
temp[j++] = src[i];
}
temp[j] = 0;
}
if (temp[0] == '0' && temp[1] == '\0') {
v4->sin_family = AF_INET;
v4->sin_addr.s_addr = INADDR_ANY;
return 1;
}
if (inet_pton(AF_INET, temp, &v4->sin_addr) == 1) {
out->sa_family = AF_INET;
return 1;
}
if (inet_pton(AF_INET6, temp, &v6->sin6_addr) == 1) {
out->sa_family = AF_INET6;
return 1;
}
return 0;
}
int parse_to_sockaddr(struct sockaddr *out, char *address)
{
struct sockaddr_un *un = (struct sockaddr_un *) out;
NULLCHECK(address);
if (address[0] == '/') {
un->sun_family = AF_UNIX;
strncpy(un->sun_path, address, 108); /* FIXME: linux only */
return 1;
}
return parse_ip_to_sockaddr(out, address);
}
int parse_acl(struct ip_and_mask (**out)[], int max, char **entries)
{
struct ip_and_mask *list;
int i;
if (max == 0) {
*out = NULL;
return 0;
} else {
list = xmalloc(max * sizeof(struct ip_and_mask));
*out = (struct ip_and_mask(*)[]) list;
debug("acl alloc: %p", *out);
}
for (i = 0; i < max; i++) {
int j;
struct ip_and_mask *outentry = &list[i];
# define MAX_MASK_BITS (outentry->ip.family == AF_INET ? 32 : 128)
if (parse_ip_to_sockaddr(&outentry->ip.generic, entries[i]) == 0) {
return i;
}
for (j = 0; entries[i][j] && entries[i][j] != '/'; j++); // increment j!
if (entries[i][j] == '/') {
outentry->mask = atoi(entries[i] + j + 1);
if (outentry->mask < 1 || outentry->mask > MAX_MASK_BITS) {
return i;
}
} else {
outentry->mask = MAX_MASK_BITS;
}
# undef MAX_MASK_BITS
debug("acl ptr[%d]: %p %d", i, outentry, outentry->mask);
}
for (i = 0; i < max; i++) {
debug("acl entry %d @ %p has mask %d", i, list[i], list[i].mask);
}
return max;
}
void parse_port(char *s_port, struct sockaddr_in *out)
{
NULLCHECK(s_port);
int raw_port;
raw_port = atoi(s_port);
if (raw_port < 0 || raw_port > 65535) {
fatal("Port number must be >= 0 and <= 65535");
}
out->sin_port = htobe16(raw_port);
}

28
src/common/parse.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef PARSE_H
#define PARSE_H
#include <sys/socket.h>
#include <sys/un.h>
#include <arpa/inet.h>
#include <unistd.h>
union mysockaddr {
unsigned short family;
struct sockaddr generic;
struct sockaddr_in v4;
struct sockaddr_in6 v6;
struct sockaddr_un un;
};
struct ip_and_mask {
union mysockaddr ip;
int mask;
};
int parse_ip_to_sockaddr(struct sockaddr *out, char *src);
int parse_to_sockaddr(struct sockaddr *out, char *src);
int parse_acl(struct ip_and_mask (**out)[], int max, char **entries);
void parse_port(char *s_port, struct sockaddr_in *out);
#endif

268
src/common/readwrite.c Normal file
View File

@@ -0,0 +1,268 @@
#include "nbdtypes.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "serve.h"
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
int socket_connect(struct sockaddr *to, struct sockaddr *from)
{
int fd =
socket(to->sa_family == AF_INET ? PF_INET : PF_INET6, SOCK_STREAM,
0);
if (fd < 0) {
warn("Couldn't create client socket");
return -1;
}
if (NULL != from) {
if (0 > bind(fd, from, sizeof(struct sockaddr_in6))) {
warn(SHOW_ERRNO("bind() to source address failed"));
if (0 > close(fd)) { /* Non-fatal leak */
warn(SHOW_ERRNO("Failed to close fd %i", fd));
}
return -1;
}
}
if (0 > sock_try_connect(fd, to, sizeof(struct sockaddr_in6), 15)) {
warn(SHOW_ERRNO("connect failed"));
if (0 > close(fd)) { /* Non-fatal leak */
warn(SHOW_ERRNO("Failed to close fd %i", fd));
}
return -1;
}
if (sock_set_tcp_nodelay(fd, 1) == -1) {
warn(SHOW_ERRNO("Failed to set TCP_NODELAY"));
}
return fd;
}
int nbd_check_hello(struct nbd_init_raw *init_raw, uint64_t * out_size,
uint32_t * out_flags)
{
if (strncmp(init_raw->passwd, INIT_PASSWD, 8) != 0) {
warn("wrong passwd");
goto fail;
}
if (be64toh(init_raw->magic) != INIT_MAGIC) {
warn("wrong magic (%x)", be64toh(init_raw->magic));
goto fail;
}
if (NULL != out_size) {
*out_size = be64toh(init_raw->size);
}
if (NULL != out_flags) {
*out_flags = be32toh(init_raw->flags);
}
return 1;
fail:
return 0;
}
int socket_nbd_read_hello(int fd, uint64_t * out_size,
uint32_t * out_flags)
{
struct nbd_init_raw init_raw;
if (0 > readloop(fd, &init_raw, sizeof(init_raw))) {
warn(SHOW_ERRNO("Couldn't read init"));
return 0;
}
return nbd_check_hello(&init_raw, out_size, out_flags);
}
void nbd_hello_to_buf(struct nbd_init_raw *buf, off64_t out_size,
uint32_t out_flags)
{
struct nbd_init init;
memcpy(&init.passwd, INIT_PASSWD, 8);
init.magic = INIT_MAGIC;
init.size = out_size;
init.flags = out_flags;
memset(buf, 0, sizeof(struct nbd_init_raw)); // ensure reserved is 0s
nbd_h2r_init(&init, buf);
return;
}
int socket_nbd_write_hello(int fd, off64_t out_size, uint32_t out_flags)
{
struct nbd_init_raw init_raw;
nbd_hello_to_buf(&init_raw, out_size, out_flags);
if (0 > writeloop(fd, &init_raw, sizeof(init_raw))) {
warn(SHOW_ERRNO("failed to write hello to socket"));
return 0;
}
return 1;
}
void fill_request(struct nbd_request_raw *request_raw, uint16_t type,
uint16_t flags, uint64_t from, uint32_t len)
{
request_raw->magic = htobe32(REQUEST_MAGIC);
request_raw->type = htobe16(type);
request_raw->flags = htobe16(flags);
request_raw->handle.w =
(((uint64_t) rand()) << 32) | ((uint64_t) rand());
request_raw->from = htobe64(from);
request_raw->len = htobe32(len);
}
void read_reply(int fd, uint64_t request_raw_handle,
struct nbd_reply *reply)
{
struct nbd_reply_raw reply_raw;
ERROR_IF_NEGATIVE(readloop
(fd, &reply_raw, sizeof(struct nbd_reply_raw)),
SHOW_ERRNO("Couldn't read reply"));
nbd_r2h_reply(&reply_raw, reply);
if (reply->magic != REPLY_MAGIC) {
error("Reply magic incorrect (%x)", reply->magic);
}
if (reply->error != 0) {
error("Server replied with error %d", reply->error);
}
if (request_raw_handle != reply_raw.handle.w) {
error("Did not reply with correct handle");
}
}
void wait_for_data(int fd, int timeout_secs)
{
fd_set fds;
struct timeval tv = { timeout_secs, 0 };
int selected;
FD_ZERO(&fds);
FD_SET(fd, &fds);
selected =
sock_try_select(FD_SETSIZE, &fds, NULL, NULL,
timeout_secs >= 0 ? &tv : NULL);
FATAL_IF(-1 == selected, "Select failed");
ERROR_IF(0 == selected, "Timed out waiting for reply");
}
void socket_nbd_read(int fd, uint64_t from, uint32_t len, int out_fd,
void *out_buf, int timeout_secs)
{
struct nbd_request_raw request_raw;
struct nbd_reply reply;
fill_request(&request_raw, REQUEST_READ, 0, from, len);
FATAL_IF_NEGATIVE(writeloop(fd, &request_raw, sizeof(request_raw)),
SHOW_ERRNO("Couldn't write request"));
wait_for_data(fd, timeout_secs);
read_reply(fd, request_raw.handle.w, &reply);
if (out_buf) {
FATAL_IF_NEGATIVE(readloop(fd, out_buf, len),
SHOW_ERRNO("Read failed"));
} else {
FATAL_IF_NEGATIVE(splice_via_pipe_loop(fd, out_fd, len),
"Splice failed");
}
}
void socket_nbd_write(int fd, uint64_t from, uint32_t len, int in_fd,
void *in_buf, int timeout_secs)
{
struct nbd_request_raw request_raw;
struct nbd_reply reply;
fill_request(&request_raw, REQUEST_WRITE, 0, from, len);
ERROR_IF_NEGATIVE(writeloop(fd, &request_raw, sizeof(request_raw)),
SHOW_ERRNO("Couldn't write request"));
if (in_buf) {
ERROR_IF_NEGATIVE(writeloop(fd, in_buf, len),
SHOW_ERRNO("Write failed"));
} else {
ERROR_IF_NEGATIVE(splice_via_pipe_loop(in_fd, fd, len),
"Splice failed");
}
wait_for_data(fd, timeout_secs);
read_reply(fd, request_raw.handle.w, &reply);
}
int socket_nbd_disconnect(int fd)
{
int success = 1;
struct nbd_request_raw request_raw;
fill_request(&request_raw, REQUEST_DISCONNECT, 0, 0, 0);
/* FIXME: This shouldn't be a FATAL error. We should just drop
* the mirror without affecting the main server.
*/
FATAL_IF_NEGATIVE(writeloop(fd, &request_raw, sizeof(request_raw)),
SHOW_ERRNO
("Failed to write the disconnect request."));
return success;
}
#define CHECK_RANGE(error_type) { \
uint64_t size;\
uint32_t flags;\
int success = socket_nbd_read_hello(params->client, &size, &flags); \
if ( success ) {\
uint64_t endpoint = params->from + params->len; \
if (endpoint > size || \
endpoint < params->from ) { /* this happens on overflow */ \
fatal(error_type \
" request %d+%d is out of range given size %d", \
params->from, params->len, size\
);\
}\
}\
else {\
fatal( error_type " connection failed." );\
}\
}
void do_read(struct mode_readwrite_params *params)
{
params->client =
socket_connect(&params->connect_to.generic,
&params->connect_from.generic);
FATAL_IF_NEGATIVE(params->client, "Couldn't connect.");
CHECK_RANGE("read");
socket_nbd_read(params->client, params->from, params->len,
params->data_fd, NULL, 10);
close(params->client);
}
void do_write(struct mode_readwrite_params *params)
{
params->client =
socket_connect(&params->connect_to.generic,
&params->connect_from.generic);
FATAL_IF_NEGATIVE(params->client, "Couldn't connect.");
CHECK_RANGE("write");
socket_nbd_write(params->client, params->from, params->len,
params->data_fd, NULL, 10);
close(params->client);
}

26
src/common/readwrite.h Normal file
View File

@@ -0,0 +1,26 @@
#ifndef READWRITE_H
#define READWRITE_H
#include <sys/types.h>
#include <sys/socket.h>
#include "nbdtypes.h"
int socket_connect(struct sockaddr *to, struct sockaddr *from);
int socket_nbd_read_hello(int fd, uint64_t * size, uint32_t * flags);
int socket_nbd_write_hello(int fd, uint64_t size, uint32_t flags);
void socket_nbd_read(int fd, uint64_t from, uint32_t len, int out_fd,
void *out_buf, int timeout_secs);
void socket_nbd_write(int fd, uint64_t from, uint32_t len, int out_fd,
void *out_buf, int timeout_secs);
int socket_nbd_disconnect(int fd);
/* as you can see, we're slowly accumulating code that should really be in an
* NBD library */
void nbd_hello_to_buf(struct nbd_init_raw *buf, uint64_t out_size,
uint32_t out_flags);
int nbd_check_hello(struct nbd_init_raw *init_raw, uint64_t * out_size,
uint32_t * out_flags);
#endif

65
src/common/remote.c Normal file
View File

@@ -0,0 +1,65 @@
#include "ioutil.h"
#include "util.h"
#include <stdlib.h>
#include <sys/un.h>
static const int max_response = 1024;
void print_response(const char *response)
{
char *response_text;
FILE *out;
int exit_status;
NULLCHECK(response);
exit_status = atoi(response);
response_text = strchr(response, ':');
FATAL_IF_NULL(response_text,
"Error parsing server response: '%s'", response);
out = exit_status > 0 ? stderr : stdout;
fprintf(out, "%s\n", response_text + 2);
}
void do_remote_command(char *command, char *socket_name, int argc,
char **argv)
{
char newline = 10;
int i;
debug("connecting to run remote command %s", command);
int remote = socket(AF_UNIX, SOCK_STREAM, 0);
struct sockaddr_un address;
char response[max_response];
memset(&address, 0, sizeof(address));
FATAL_IF_NEGATIVE(remote, "Couldn't create client socket");
address.sun_family = AF_UNIX;
strncpy(address.sun_path, socket_name, sizeof(address.sun_path));
FATAL_IF_NEGATIVE(connect
(remote, (struct sockaddr *) &address,
sizeof(address)), "Couldn't connect to %s",
socket_name);
write(remote, command, strlen(command));
write(remote, &newline, 1);
for (i = 0; i < argc; i++) {
if (NULL != argv[i]) {
write(remote, argv[i], strlen(argv[i]));
}
write(remote, &newline, 1);
}
write(remote, &newline, 1);
FATAL_IF_NEGATIVE(read_until_newline(remote, response, max_response),
"Couldn't read response from %s", socket_name);
print_response(response);
exit(atoi(response));
}

150
src/common/self_pipe.c Normal file
View File

@@ -0,0 +1,150 @@
/**
* self_pipe.c
*
* author: Alex Young <alex@bytemark.co.uk>
*
* Wrapper for the self-pipe trick for select()-based thread
* synchronisation. Get yourself a self_pipe with self_pipe_create(),
* select() on the read end of the pipe with the help of
* self_pipe_fd_set( sig, fds ) and self_pipe_fd_isset( sig, fds ).
* When you've received a signal, clear it with
* self_pipe_signal_clear(sig) so that the buffer doesn't get filled.
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include "util.h"
#include "self_pipe.h"
#define ERR_MSG_PIPE "Couldn't open a pipe for signaling."
#define ERR_MSG_FCNTL "Couldn't set a signalling pipe non-blocking."
#define ERR_MSG_WRITE "Couldn't write to a signaling pipe."
#define ERR_MSG_READ "Couldn't read from a signaling pipe."
void self_pipe_server_error(int err, char *msg)
{
char errbuf[1024] = { 0 };
strerror_r(err, errbuf, 1024);
fatal("%s\t%d (%s)", msg, err, errbuf);
}
/**
* Allocate a struct self_pipe, opening the pipe.
*
* Returns NULL if the pipe couldn't be opened or if we couldn't set it
* non-blocking.
*
* Remember to call self_pipe_destroy when you're done with the return
* value.
*/
struct self_pipe *self_pipe_create(void)
{
struct self_pipe *sig = xmalloc(sizeof(struct self_pipe));
int fds[2];
if (NULL == sig) {
return NULL;
}
if (pipe(fds)) {
free(sig);
self_pipe_server_error(errno, ERR_MSG_PIPE);
return NULL;
}
if (fcntl(fds[0], F_SETFL, O_NONBLOCK)
|| fcntl(fds[1], F_SETFL, O_NONBLOCK)) {
int fcntl_err = errno;
while (close(fds[0]) == -1 && errno == EINTR);
while (close(fds[1]) == -1 && errno == EINTR);
free(sig);
self_pipe_server_error(fcntl_err, ERR_MSG_FCNTL);
return NULL;
}
sig->read_fd = fds[0];
sig->write_fd = fds[1];
return sig;
}
/**
* Send a signal to anyone select()ing on this signal.
*
* Returns 1 on success. Can fail if weirdness happened to the write fd
* of the pipe in the self_pipe struct.
*/
int self_pipe_signal(struct self_pipe *sig)
{
NULLCHECK(sig);
FATAL_IF(1 == sig->write_fd, "Shouldn't be writing to stdout");
FATAL_IF(2 == sig->write_fd, "Shouldn't be writing to stderr");
int written = write(sig->write_fd, "X", 1);
if (written != 1) {
self_pipe_server_error(errno, ERR_MSG_WRITE);
return 0;
}
return 1;
}
/**
* Clear a received signal from the pipe. Every signal sent must be
* cleared by one (and only one) recipient when they return from select()
* if the signal is to be used more than once.
* Returns the number of bytes read, which will be 1 on success and 0 if
* there was no signal.
*/
int self_pipe_signal_clear(struct self_pipe *sig)
{
char buf[1];
return 1 == read(sig->read_fd, buf, 1);
}
/**
* Close the pipe and free the self_pipe. Do not try to use the
* self_pipe struct after calling this, the innards are mush.
*/
int self_pipe_destroy(struct self_pipe *sig)
{
NULLCHECK(sig);
while (close(sig->read_fd) == -1 && errno == EINTR);
while (close(sig->write_fd) == -1 && errno == EINTR);
/* Just in case anyone *does* try to use this after free,
* we should set the memory locations to an error value
*/
sig->read_fd = -1;
sig->write_fd = -1;
free(sig);
return 1;
}
int self_pipe_fd_set(struct self_pipe *sig, fd_set * fds)
{
FD_SET(sig->read_fd, fds);
return 1;
}
int self_pipe_fd_isset(struct self_pipe *sig, fd_set * fds)
{
return FD_ISSET(sig->read_fd, fds);
}

19
src/common/self_pipe.h Normal file
View File

@@ -0,0 +1,19 @@
#ifndef SELF_PIPE_H
#define SELF_PIPE_H
#include <sys/select.h>
struct self_pipe {
int read_fd;
int write_fd;
};
struct self_pipe *self_pipe_create(void);
int self_pipe_signal(struct self_pipe *sig);
int self_pipe_signal_clear(struct self_pipe *sig);
int self_pipe_destroy(struct self_pipe *sig);
int self_pipe_fd_set(struct self_pipe *sig, fd_set * fds);
int self_pipe_fd_isset(struct self_pipe *sig, fd_set * fds);
#endif

295
src/common/sockutil.c Normal file
View File

@@ -0,0 +1,295 @@
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <netinet/tcp.h>
#include <sys/un.h>
#include "sockutil.h"
#include "util.h"
size_t sockaddr_size(const struct sockaddr * sa)
{
struct sockaddr_un *un = (struct sockaddr_un *) sa;
size_t ret = 0;
switch (sa->sa_family) {
case AF_INET:
ret = sizeof(struct sockaddr_in);
break;
case AF_INET6:
ret = sizeof(struct sockaddr_in6);
break;
case AF_UNIX:
ret = sizeof(un->sun_family) + SUN_LEN(un);
break;
}
return ret;
}
const char *sockaddr_address_string(const struct sockaddr *sa, char *dest,
size_t len)
{
NULLCHECK(sa);
NULLCHECK(dest);
struct sockaddr_in *in = (struct sockaddr_in *) sa;
struct sockaddr_in6 *in6 = (struct sockaddr_in6 *) sa;
struct sockaddr_un *un = (struct sockaddr_un *) sa;
unsigned short real_port = ntohs(in->sin_port); // common to in and in6
const char *ret = NULL;
memset(dest, 0, len);
if (sa->sa_family == AF_INET) {
ret = inet_ntop(AF_INET, &in->sin_addr, dest, len);
} else if (sa->sa_family == AF_INET6) {
ret = inet_ntop(AF_INET6, &in6->sin6_addr, dest, len);
} else if (sa->sa_family == AF_UNIX) {
ret = strncpy(dest, un->sun_path, SUN_LEN(un));
}
if (ret == NULL) {
strncpy(dest, "???", len);
}
if (NULL != ret && real_port > 0 && sa->sa_family != AF_UNIX) {
size_t size = strlen(dest);
snprintf(dest + size, len - size, " port %d", real_port);
}
return ret;
}
int sock_set_reuseaddr(int fd, int optval)
{
return setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval));
}
int sock_set_keepalive_params(int fd, int time, int intvl, int probes)
{
if (sock_set_keepalive(fd, 1) ||
sock_set_tcp_keepidle(fd, time) ||
sock_set_tcp_keepintvl(fd, intvl) ||
sock_set_tcp_keepcnt(fd, probes)) {
return -1;
}
return 0;
}
int sock_set_keepalive(int fd, int optval)
{
return setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &optval,
sizeof(optval));
}
int sock_set_tcp_keepidle(int fd, int optval)
{
return setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &optval,
sizeof(optval));
}
int sock_set_tcp_keepintvl(int fd, int optval)
{
return setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &optval,
sizeof(optval));
}
int sock_set_tcp_keepcnt(int fd, int optval)
{
return setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &optval,
sizeof(optval));
}
/* Set the tcp_nodelay option */
int sock_set_tcp_nodelay(int fd, int optval)
{
return setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &optval,
sizeof(optval));
}
int sock_set_tcp_cork(int fd, int optval)
{
return setsockopt(fd, IPPROTO_TCP, TCP_CORK, &optval, sizeof(optval));
}
int sock_set_nonblock(int fd, int optval)
{
int flags = fcntl(fd, F_GETFL);
if (flags == -1) {
return -1;
}
if (optval) {
flags = flags | O_NONBLOCK;
} else {
flags = flags & (~O_NONBLOCK);
}
return fcntl(fd, F_SETFL, flags);
}
int sock_try_bind(int fd, const struct sockaddr *sa)
{
int bind_result;
char s_address[256];
int retry = 10;
sockaddr_address_string(sa, &s_address[0], 256);
do {
bind_result = bind(fd, sa, sockaddr_size(sa));
if (0 == bind_result) {
info("Bound to %s", s_address);
break;
} else {
warn(SHOW_ERRNO("Couldn't bind to %s", s_address));
switch (errno) {
/* bind() can give us EACCES, EADDRINUSE, EADDRNOTAVAIL, EBADF,
* EINVAL, ENOTSOCK, EFAULT, ELOOP, ENAMETOOLONG, ENOENT,
* ENOMEM, ENOTDIR, EROFS
*
* Any of these other than EADDRINUSE & EADDRNOTAVAIL signify
* that there's a logic error somewhere.
*
* EADDRINUSE is fatal: if there's something already where we
* want to be listening, we have no guarantees that any clients
* will cope with it.
*/
case EADDRNOTAVAIL:
retry--;
if (retry) {
debug("retrying");
sleep(1);
}
continue;
case EADDRINUSE:
warn("%s in use, giving up.", s_address);
retry = 0;
break;
default:
warn("giving up");
retry = 0;
}
}
} while (retry);
return bind_result;
}
int sock_try_select(int nfds, fd_set * readfds, fd_set * writefds,
fd_set * exceptfds, struct timeval *timeout)
{
int result;
do {
result = select(nfds, readfds, writefds, exceptfds, timeout);
if (errno != EINTR) {
break;
}
} while (result == -1);
return result;
}
int sock_try_connect(int fd, struct sockaddr *to, socklen_t addrlen,
int wait)
{
fd_set fds;
struct timeval tv = { wait, 0 };
int result = 0;
if (sock_set_nonblock(fd, 1) == -1) {
warn(SHOW_ERRNO
("Failed to set socket non-blocking for connect()"));
return connect(fd, to, addrlen);
}
FD_ZERO(&fds);
FD_SET(fd, &fds);
do {
result = connect(fd, to, addrlen);
if (result == -1) {
switch (errno) {
case EINPROGRESS:
result = 0;
break; /* success */
case EAGAIN:
case EINTR:
/* Try connect() again. This only breaks out of the switch,
* not the do...while loop. since result == -1, we go again.
*/
break;
default:
warn(SHOW_ERRNO("Failed to connect()"));
goto out;
}
}
} while (result == -1);
if (-1 == sock_try_select(FD_SETSIZE, NULL, &fds, NULL, &tv)) {
warn(SHOW_ERRNO("failed to select() on non-blocking connect"));
result = -1;
goto out;
}
if (!FD_ISSET(fd, &fds)) {
result = -1;
errno = ETIMEDOUT;
goto out;
}
int scratch;
socklen_t s_size = sizeof(scratch);
if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &scratch, &s_size) == -1) {
result = -1;
warn(SHOW_ERRNO("getsockopt() failed"));
goto out;
}
if (scratch == EINPROGRESS) {
scratch = ETIMEDOUT;
}
result = scratch ? -1 : 0;
errno = scratch;
out:
if (sock_set_nonblock(fd, 0) == -1) {
warn(SHOW_ERRNO("Failed to make socket blocking after connect()"));
return -1;
}
debug("sock_try_connect: %i", result);
return result;
}
int sock_try_close(int fd)
{
int result;
do {
result = close(fd);
if (result == -1) {
if (EINTR == errno) {
continue; /* retry EINTR */
} else {
warn(SHOW_ERRNO("Failed to close() fd %i", fd));
break; /* Other errors get reported */
}
}
} while (0);
return result;
}

58
src/common/sockutil.h Normal file
View File

@@ -0,0 +1,58 @@
#ifndef SOCKUTIL_H
#define SOCKUTIL_H
#include <sys/time.h>
#include <sys/socket.h>
#include <sys/select.h>
/* Returns the size of the sockaddr, or 0 on error */
size_t sockaddr_size(const struct sockaddr *sa);
/* Convert a sockaddr into an address. Like inet_ntop, it returns dest if
* successful, NULL otherwise. In the latter case, dest will contain "???"
*/
const char *sockaddr_address_string(const struct sockaddr *sa, char *dest,
size_t len);
/* Configure TCP keepalive on a socket */
int sock_set_keepalive_params(int fd, int time, int intvl, int probes);
/* Set the SOL_KEEPALIVE otion */
int sock_set_keepalive(int fd, int optval);
/* Set the SOL_REUSEADDR otion */
int sock_set_reuseaddr(int fd, int optval);
/* Set the tcp_keepidle option */
int sock_set_tcp_keepidle(int fd, int optval);
/* Set the tcp_keepintvl option */
int sock_set_tcp_keepintvl(int fd, int optval);
/* Set the tcp_keepcnt option */
int sock_set_tcp_keepcnt(int fd, int optval);
/* Set the tcp_nodelay option */
int sock_set_tcp_nodelay(int fd, int optval);
/* Set the tcp_cork option */
int sock_set_tcp_cork(int fd, int optval);
int sock_set_nonblock(int fd, int optval);
/* Attempt to bind the fd to the sockaddr, retrying common transient failures */
int sock_try_bind(int fd, const struct sockaddr *sa);
/* Try to call select(), retrying EINTR */
int sock_try_select(int nfds, fd_set * readfds, fd_set * writefds,
fd_set * exceptfds, struct timeval *timeout);
/* Try to call connect(), timing out after wait seconds */
int sock_try_connect(int fd, struct sockaddr *to, socklen_t addrlen,
int wait);
/* Try to call close(), retrying EINTR */
int sock_try_close(int fd);
#endif

91
src/common/util.c Normal file
View File

@@ -0,0 +1,91 @@
#include <stdarg.h>
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <malloc.h>
#include <unistd.h>
#include <time.h>
#include "util.h"
pthread_key_t cleanup_handler_key;
int log_level = 2;
char *log_context = "";
void error_init(void)
{
pthread_key_create(&cleanup_handler_key, free);
}
void error_handler(int fatal)
{
DECLARE_ERROR_CONTEXT(context);
if (context) {
longjmp(context->jmp, fatal ? 1 : 2);
} else {
if (fatal) {
abort();
} else {
pthread_exit((void *) 1);
}
}
}
void exit_err(const char *msg)
{
fprintf(stderr, "%s\n", msg);
exit(1);
}
void mylog(int line_level, const char *format, ...)
{
if (line_level < log_level) {
return;
}
va_list argptr;
va_start(argptr, format);
vfprintf(stderr, format, argptr);
va_end(argptr);
}
uint64_t monotonic_time_ms()
{
struct timespec ts;
uint64_t seconds_ms, nanoseconds_ms;
FATAL_IF_NEGATIVE(clock_gettime(CLOCK_MONOTONIC, &ts),
SHOW_ERRNO("clock_gettime failed")
);
seconds_ms = ts.tv_sec;
seconds_ms = seconds_ms * 1000;
nanoseconds_ms = ts.tv_nsec;
nanoseconds_ms = nanoseconds_ms / 1000000;
return seconds_ms + nanoseconds_ms;
}
void *xrealloc(void *ptr, size_t size)
{
void *p = realloc(ptr, size);
FATAL_IF_NULL(p, "couldn't xrealloc %d bytes",
ptr ? "realloc" : "malloc", size);
return p;
}
void *xmalloc(size_t size)
{
void *p = xrealloc(NULL, size);
memset(p, 0, size);
return p;
}

View File

@@ -10,10 +10,11 @@
#include <unistd.h>
#include <inttypes.h>
void* xrealloc(void* ptr, size_t size);
void* xmalloc(size_t size);
void *xrealloc(void *ptr, size_t size);
void *xmalloc(size_t size);
typedef void (cleanup_handler)(void* /* context */, int /* is fatal? */);
typedef void (cleanup_handler) (void * /* context */ ,
int /* is fatal? */ );
/* set from 0 - 5 depending on what level of verbosity you want */
extern int log_level;
@@ -21,16 +22,19 @@ extern int log_level;
/* set up the error globals */
void error_init(void);
/* some context for the overall process that appears on each log line */
extern char *log_context;
void exit_err( const char * );
void exit_err(const char *);
/* error_set_handler must be a macro not a function due to setjmp stack rules */
#include <setjmp.h>
struct error_handler_context {
jmp_buf jmp;
cleanup_handler* handler;
void* data;
jmp_buf jmp;
cleanup_handler *handler;
void *data;
};
#define DECLARE_ERROR_CONTEXT(name) \
@@ -84,7 +88,7 @@ extern pthread_key_t cleanup_handler_key;
void error_handler(int fatal);
/* mylog a line at the given level (0 being most verbose) */
void mylog(int line_level, const char* format, ...);
void mylog(int line_level, const char *format, ...);
/* Returns the current time, in milliseconds, from CLOCK_MONOTONIC */
uint64_t monotonic_time_ms(void);
@@ -92,12 +96,12 @@ uint64_t monotonic_time_ms(void);
#define levstr(i) (i==0?'D':(i==1?'I':(i==2?'W':(i==3?'E':'F'))))
#define myloglev(level, msg, ...) mylog( level, "%"PRIu64":%c:%d %p %s:%d: "msg"\n", monotonic_time_ms(), levstr(level), getpid(),pthread_self(), __FILE__, __LINE__, ##__VA_ARGS__ )
#define myloglev(level, msg, ...) mylog( level, "%"PRIu64":%c:%d %p %s %s:%d: "msg"\n", monotonic_time_ms(), levstr(level), getpid(),pthread_self(), log_context, __FILE__, __LINE__, ##__VA_ARGS__ )
#ifdef DEBUG
# define debug(msg, ...) myloglev(0, msg, ##__VA_ARGS__)
#define debug(msg, ...) myloglev(0, msg, ##__VA_ARGS__)
#else
# define debug(msg, ...) /* no-op */
#define debug(msg, ...) /* no-op */
#endif
/* informational message, not expected to be compiled out */
@@ -116,6 +120,7 @@ uint64_t monotonic_time_ms(void);
#define fatal(msg, ...) do { \
myloglev(4, msg, ##__VA_ARGS__); \
error_handler(1); \
exit(1); /* never-reached, this is to make static code analizer happy */ \
} while(0)
@@ -158,4 +163,3 @@ uint64_t monotonic_time_ms(void);
#define WARN_IF_NEGATIVE( value, msg, ... ) if ( value < 0 ) { warn( msg, ##__VA_ARGS__ ); }
#endif

View File

@@ -1,633 +0,0 @@
/* FlexNBD server (C) Bytemark Hosting 2012
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/** The control server responds on a UNIX socket and services our "remote"
* commands which are used for changing the access control list, initiating
* a mirror process, or asking for status. The protocol is pretty simple -
* after connecting the client sends a series of LF-terminated lines, followed
* by a blank line (i.e. double LF). The first line is taken to be the command
* name to invoke, and the lines before the double LF are its arguments.
*
* These commands can be invoked remotely from the command line, with the
* client code to be found in remote.c
*/
#include "control.h"
#include "mirror.h"
#include "serve.h"
#include "util.h"
#include "ioutil.h"
#include "parse.h"
#include "readwrite.h"
#include "bitset.h"
#include "self_pipe.h"
#include "acl.h"
#include "status.h"
#include "mbox.h"
#include <stdlib.h>
#include <string.h>
#include <sys/un.h>
#include <unistd.h>
struct control * control_create(
struct flexnbd * flexnbd,
const char * csn)
{
struct control * control = xmalloc( sizeof( struct control ) );
NULLCHECK( csn );
control->flexnbd = flexnbd;
control->socket_name = csn;
control->open_signal = self_pipe_create();
control->close_signal = self_pipe_create();
control->mirror_state_mbox = mbox_create();
return control;
}
void control_signal_close( struct control * control)
{
NULLCHECK( control );
self_pipe_signal( control->close_signal );
}
void control_destroy( struct control * control )
{
NULLCHECK( control );
mbox_destroy( control->mirror_state_mbox );
self_pipe_destroy( control->close_signal );
self_pipe_destroy( control->open_signal );
free( control );
}
struct control_client * control_client_create(
struct flexnbd * flexnbd,
int client_fd ,
struct mbox * state_mbox )
{
NULLCHECK( flexnbd );
struct control_client * control_client =
xmalloc( sizeof( struct control_client ) );
control_client->socket = client_fd;
control_client->flexnbd = flexnbd;
control_client->mirror_state_mbox = state_mbox;
return control_client;
}
void control_client_destroy( struct control_client * client )
{
NULLCHECK( client );
free( client );
}
void control_respond(struct control_client * client);
void control_handle_client( struct control * control, int client_fd )
{
NULLCHECK( control );
NULLCHECK( control->flexnbd );
struct control_client * control_client =
control_client_create(
control->flexnbd,
client_fd ,
control->mirror_state_mbox);
/* We intentionally don't spawn a thread for the client here.
* This is to avoid having more than one thread potentially
* waiting on the migration commit status.
*/
control_respond( control_client );
}
void control_accept_client( struct control * control )
{
int client_fd;
union mysockaddr client_address;
socklen_t addrlen = sizeof( union mysockaddr );
client_fd = accept( control->control_fd, &client_address.generic, &addrlen );
FATAL_IF( -1 == client_fd, "control accept failed" );
control_handle_client( control, client_fd );
}
int control_accept( struct control * control )
{
NULLCHECK( control );
fd_set fds;
FD_ZERO( &fds );
FD_SET( control->control_fd, &fds );
self_pipe_fd_set( control->close_signal, &fds );
debug("Control thread selecting");
FATAL_UNLESS( 0 < select( FD_SETSIZE, &fds, NULL, NULL, NULL ),
"Control select failed." );
if ( self_pipe_fd_isset( control->close_signal, &fds ) ){
return 0;
}
if ( FD_ISSET( control->control_fd, &fds ) ) {
control_accept_client( control );
}
return 1;
}
void control_accept_loop( struct control * control )
{
while( control_accept( control ) );
}
int open_control_socket( const char * socket_name )
{
struct sockaddr_un bind_address;
int control_fd;
if (!socket_name) {
fatal( "Tried to open a control socket without a socket name" );
}
control_fd = socket(AF_UNIX, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(control_fd ,
"Couldn't create control socket");
memset(&bind_address, 0, sizeof(struct sockaddr_un));
bind_address.sun_family = AF_UNIX;
strncpy(bind_address.sun_path, socket_name, sizeof(bind_address.sun_path)-1);
//unlink(socket_name); /* ignore failure */
FATAL_IF_NEGATIVE(
bind(control_fd , &bind_address, sizeof(bind_address)),
"Couldn't bind control socket to %s: %s",
socket_name, strerror( errno )
);
FATAL_IF_NEGATIVE(
listen(control_fd , 5),
"Couldn't listen on control socket"
);
return control_fd;
}
void control_listen(struct control* control)
{
NULLCHECK( control );
control->control_fd = open_control_socket( control->socket_name );
}
void control_wait_for_open_signal( struct control * control )
{
fd_set fds;
FD_ZERO( &fds );
self_pipe_fd_set( control->open_signal, &fds );
FATAL_IF_NEGATIVE( select( FD_SETSIZE, &fds, NULL, NULL, NULL ),
"select() failed" );
self_pipe_signal_clear( control->open_signal );
}
void control_serve( struct control * control )
{
NULLCHECK( control );
control_wait_for_open_signal( control );
control_listen( control );
while( control_accept( control ) );
}
void control_cleanup(
struct control * control,
int fatal __attribute__((unused)) )
{
NULLCHECK( control );
unlink( control->socket_name );
close( control->control_fd );
}
void * control_runner( void * control_uncast )
{
debug("Control thread");
NULLCHECK( control_uncast );
struct control * control = (struct control *)control_uncast;
error_set_handler( (cleanup_handler*)control_cleanup, control );
control_serve( control );
control_cleanup( control, 0 );
pthread_exit( NULL );
}
#define write_socket(msg) write(client_fd, (msg "\n"), strlen((msg))+1)
void control_write_mirror_response( enum mirror_state mirror_state, int client_fd )
{
switch (mirror_state) {
case MS_INIT:
case MS_UNKNOWN:
write_socket( "1: Mirror failed to initialise" );
fatal( "Impossible mirror state: %d", mirror_state );
case MS_FAIL_CONNECT:
write_socket( "1: Mirror failed to connect");
break;
case MS_FAIL_REJECTED:
write_socket( "1: Mirror was rejected" );
break;
case MS_FAIL_NO_HELLO:
write_socket( "1: Remote server failed to respond");
break;
case MS_FAIL_SIZE_MISMATCH:
write_socket( "1: Remote size does not match local size" );
break;
case MS_ABANDONED:
write_socket( "1: Mirroring abandoned" );
break;
case MS_GO:
case MS_DONE: /* Yes, I know we know better, but it's simpler this way */
write_socket( "0: Mirror started" );
break;
default:
fatal( "Unhandled mirror state: %d", mirror_state );
}
}
#undef write_socket
/* Call this in the thread where you want to receive the mirror state */
enum mirror_state control_client_mirror_wait(
struct control_client* client)
{
NULLCHECK( client );
NULLCHECK( client->mirror_state_mbox );
struct mbox * mbox = client->mirror_state_mbox;
enum mirror_state mirror_state;
enum mirror_state * contents;
contents = (enum mirror_state*)mbox_receive( mbox );
NULLCHECK( contents );
mirror_state = *contents;
free( contents );
return mirror_state;
}
#define write_socket(msg) write(client->socket, (msg "\n"), strlen((msg))+1)
/** Command parser to start mirror process from socket input */
int control_mirror(struct control_client* client, int linesc, char** lines)
{
NULLCHECK( client );
struct flexnbd * flexnbd = client->flexnbd;
union mysockaddr *connect_to = xmalloc( sizeof( union mysockaddr ) );
union mysockaddr *connect_from = NULL;
uint64_t max_Bps = UINT64_MAX;
int action_at_finish;
int raw_port;
if (linesc < 2) {
write_socket("1: mirror takes at least two parameters");
return -1;
}
if (parse_ip_to_sockaddr(&connect_to->generic, lines[0]) == 0) {
write_socket("1: bad IP address");
return -1;
}
raw_port = atoi(lines[1]);
if (raw_port < 0 || raw_port > 65535) {
write_socket("1: bad IP port number");
return -1;
}
connect_to->v4.sin_port = htobe16(raw_port);
action_at_finish = ACTION_EXIT;
if (linesc > 2) {
if (strcmp("exit", lines[2]) == 0) {
action_at_finish = ACTION_EXIT;
}
else if (strcmp( "unlink", lines[2]) == 0 ) {
action_at_finish = ACTION_UNLINK;
}
else if (strcmp("nothing", lines[2]) == 0) {
action_at_finish = ACTION_NOTHING;
}
else {
write_socket("1: action must be 'exit' or 'nothing'");
return -1;
}
}
if (linesc > 3) {
connect_from = xmalloc( sizeof( union mysockaddr ) );
if (parse_ip_to_sockaddr(&connect_from->generic, lines[3]) == 0) {
write_socket("1: bad bind address");
return -1;
}
}
if (linesc > 4) {
errno = 0;
max_Bps = strtoull( lines[4], NULL, 10 );
if ( errno == ERANGE ) {
write_socket( "1: max_bps out of range" );
return -1;
} else if ( errno != 0 ) {
write_socket( "1: max_bps couldn't be parsed" );
return -1;
}
}
if (linesc > 5) {
write_socket("1: unrecognised parameters to mirror");
return -1;
}
struct server * serve = flexnbd_server(flexnbd);
server_lock_start_mirror( serve );
{
if ( server_mirror_can_start( serve ) ) {
serve->mirror_super = mirror_super_create(
serve->filename,
connect_to,
connect_from,
max_Bps ,
action_at_finish,
client->mirror_state_mbox );
serve->mirror = serve->mirror_super->mirror;
server_prevent_mirror_start( serve );
} else {
if ( serve->mirror_super ) {
warn( "Tried to start a second mirror run" );
write_socket( "1: mirror already running" );
} else {
warn( "Cannot start mirroring, shutting down" );
write_socket( "1: shutting down" );
}
}
}
server_unlock_start_mirror( serve );
/* Do this outside the lock to minimise the length of time the
* sighandler can block the serve thread
*/
if ( serve->mirror_super ) {
FATAL_IF( 0 != pthread_create(
&serve->mirror_super->thread,
NULL,
mirror_super_runner,
serve
),
"Failed to create mirror thread"
);
debug("Control thread mirror super waiting");
enum mirror_state state =
control_client_mirror_wait( client );
debug("Control thread writing response");
control_write_mirror_response( state, client->socket );
}
debug( "Control thread going away." );
return 0;
}
int control_mirror_max_bps( struct control_client* client, int linesc, char** lines )
{
NULLCHECK( client );
NULLCHECK( client->flexnbd );
struct server* serve = flexnbd_server( client->flexnbd );
uint64_t max_Bps;
if ( !serve->mirror_super ) {
write_socket( "1: Not currently mirroring" );
return -1;
}
if ( linesc != 1 ) {
write_socket( "1: Bad format" );
return -1;
}
errno = 0;
max_Bps = strtoull( lines[0], NULL, 10 );
if ( errno == ERANGE ) {
write_socket( "1: max_bps out of range" );
return -1;
} else if ( errno != 0 ) {
write_socket( "1: max_bps couldn't be parsed" );
return -1;
}
serve->mirror->max_bytes_per_second = max_Bps;
write_socket( "0: updated" );
return 0;
}
#undef write_socket
/** Command parser to alter access control list from socket input */
int control_acl(struct control_client* client, int linesc, char** lines)
{
NULLCHECK( client );
NULLCHECK( client->flexnbd );
struct flexnbd * flexnbd = client->flexnbd;
int default_deny = flexnbd_default_deny( flexnbd );
struct acl * new_acl = acl_create( linesc, lines, default_deny );
if (new_acl->len != linesc) {
warn("Bad ACL spec: %s", lines[new_acl->len] );
write(client->socket, "1: bad spec: ", 13);
write(client->socket, lines[new_acl->len],
strlen(lines[new_acl->len]));
write(client->socket, "\n", 1);
acl_destroy( new_acl );
}
else {
flexnbd_replace_acl( flexnbd, new_acl );
info("ACL set");
write( client->socket, "0: updated\n", 11);
}
return 0;
}
int control_break(
struct control_client* client,
int linesc __attribute__ ((unused)),
char** lines __attribute__((unused))
)
{
NULLCHECK( client );
NULLCHECK( client->flexnbd );
int result = 0;
struct flexnbd* flexnbd = client->flexnbd;
struct server * serve = flexnbd_server( flexnbd );
server_lock_start_mirror( serve );
{
if ( server_is_mirroring( serve ) ) {
info( "Signaling to abandon mirror" );
server_abandon_mirror( serve );
debug( "Abandon signaled" );
if ( server_is_closed( serve ) ) {
info( "Mirror completed while canceling" );
write( client->socket,
"1: mirror completed\n", 20 );
}
else {
info( "Mirror successfully stopped." );
write( client->socket,
"0: mirror stopped\n", 18 );
result = 1;
}
} else {
warn( "Not mirroring." );
write( client->socket, "1: not mirroring\n", 17 );
}
}
server_unlock_start_mirror( serve );
return result;
}
/** FIXME: add some useful statistics */
int control_status(
struct control_client* client,
int linesc __attribute__ ((unused)),
char** lines __attribute__((unused))
)
{
NULLCHECK( client );
NULLCHECK( client->flexnbd );
struct status * status = flexnbd_status_create( client->flexnbd );
write( client->socket, "0: ", 3 );
status_write( status, client->socket );
status_destroy( status );
return 0;
}
void control_client_cleanup(struct control_client* client,
int fatal __attribute__ ((unused)) )
{
if (client->socket) { close(client->socket); }
/* This is wrongness */
if ( server_acl_locked( client->flexnbd->serve ) ) { server_unlock_acl( client->flexnbd->serve ); }
control_client_destroy( client );
}
/** Master command parser for control socket connections, delegates quickly */
void control_respond(struct control_client * client)
{
char **lines = NULL;
error_set_handler((cleanup_handler*) control_client_cleanup, client);
int i, linesc;
linesc = read_lines_until_blankline(client->socket, 256, &lines);
if (linesc < 1)
{
write(client->socket, "9: missing command\n", 19);
/* ignore failure */
}
else if (strcmp(lines[0], "acl") == 0) {
info("acl command received" );
if (control_acl(client, linesc-1, lines+1) < 0) {
debug("acl command failed");
}
}
else if (strcmp(lines[0], "mirror") == 0) {
info("mirror command received" );
if (control_mirror(client, linesc-1, lines+1) < 0) {
debug("mirror command failed");
}
}
else if (strcmp(lines[0], "break") == 0) {
info( "break command received" );
if ( control_break( client, linesc-1, lines+1) < 0) {
debug( "break command failed" );
}
}
else if (strcmp(lines[0], "status") == 0) {
info("status command received" );
if (control_status(client, linesc-1, lines+1) < 0) {
debug("status command failed");
}
} else if ( strcmp( lines[0], "mirror_max_bps" ) == 0 ) {
info( "mirror_max_bps command received" );
if( control_mirror_max_bps( client, linesc-1, lines+1 ) < 0 ) {
debug( "mirror_max_bps command failed" );
}
}
else {
write(client->socket, "10: unknown command\n", 23);
}
for (i=0; i<linesc; i++) {
free(lines[i]);
}
free(lines);
control_client_cleanup(client, 0);
debug("control command handled" );
}

View File

@@ -1,59 +0,0 @@
#ifndef CONTROL_H
#define CONTROL_H
/* We need this to avoid a complaint about struct server * in
* void accept_control_connection
*/
struct server;
#include "parse.h"
#include "mirror.h"
#include "serve.h"
#include "flexnbd.h"
#include "mbox.h"
struct control {
struct flexnbd * flexnbd;
int control_fd;
const char * socket_name;
pthread_t thread;
struct self_pipe * open_signal;
struct self_pipe * close_signal;
/* This is owned by the control object, and used by a
* mirror_super to communicate the state of a mirror attempt as
* early as feasible. It can't be owned by the mirror_super
* object because the mirror_super object can be freed at any
* time (including while the control_client is waiting on it),
* whereas the control object lasts for the lifetime of the
* process (and we can only have a mirror thread if the control
* thread has started it).
*/
struct mbox * mirror_state_mbox;
};
struct control_client{
int socket;
struct flexnbd * flexnbd;
/* Passed in on creation. We know it's all right to do this
* because we know there's only ever one control_client.
*/
struct mbox * mirror_state_mbox;
};
struct control * control_create(
struct flexnbd *,
const char * control_socket_name );
void control_signal_close( struct control * );
void control_destroy( struct control * );
void * control_runner( void * );
void accept_control_connection(struct server* params, int client_fd, union mysockaddr* client_address);
void serve_open_control_socket(struct server* params);
#endif

View File

@@ -1,249 +0,0 @@
/* FlexNBD server (C) Bytemark Hosting 2012
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/** main() function for parsing and dispatching commands. Each mode has
* a corresponding structure which is filled in and passed to a do_ function
* elsewhere in the program.
*/
#include "flexnbd.h"
#include "serve.h"
#include "util.h"
#include "control.h"
#include "status.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/signalfd.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <getopt.h>
#include "acl.h"
int flexnbd_build_signal_fd(void)
{
sigset_t mask;
int sfd;
sigemptyset( &mask );
sigaddset( &mask, SIGTERM );
sigaddset( &mask, SIGQUIT );
sigaddset( &mask, SIGINT );
FATAL_UNLESS( 0 == pthread_sigmask( SIG_BLOCK, &mask, NULL ),
"Signal blocking failed" );
sfd = signalfd( -1, &mask, 0 );
FATAL_IF( -1 == sfd, "Failed to get a signal fd" );
return sfd;
}
void flexnbd_create_shared(
struct flexnbd * flexnbd,
const char * s_ctrl_sock)
{
NULLCHECK( flexnbd );
if ( s_ctrl_sock ){
flexnbd->control =
control_create( flexnbd, s_ctrl_sock );
}
else {
flexnbd->control = NULL;
}
flexnbd->signal_fd = flexnbd_build_signal_fd();
}
struct flexnbd * flexnbd_create_serving(
char* s_ip_address,
char* s_port,
char* s_file,
char* s_ctrl_sock,
int default_deny,
int acl_entries,
char** s_acl_entries,
int max_nbd_clients,
int use_killswitch)
{
struct flexnbd * flexnbd = xmalloc( sizeof( struct flexnbd ) );
flexnbd->serve = server_create(
flexnbd,
s_ip_address,
s_port,
s_file,
default_deny,
acl_entries,
s_acl_entries,
max_nbd_clients,
use_killswitch,
1);
flexnbd_create_shared( flexnbd,
s_ctrl_sock );
return flexnbd;
}
struct flexnbd * flexnbd_create_listening(
char* s_ip_address,
char* s_port,
char* s_file,
char* s_ctrl_sock,
int default_deny,
int acl_entries,
char** s_acl_entries )
{
struct flexnbd * flexnbd = xmalloc( sizeof( struct flexnbd ) );
flexnbd->serve = server_create(
flexnbd,
s_ip_address,
s_port,
s_file,
default_deny,
acl_entries,
s_acl_entries,
1, 0, 0);
flexnbd_create_shared( flexnbd, s_ctrl_sock );
return flexnbd;
}
void flexnbd_spawn_control(struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
NULLCHECK( flexnbd->control );
pthread_t * control_thread = &flexnbd->control->thread;
FATAL_UNLESS( 0 == pthread_create(
control_thread,
NULL,
control_runner,
flexnbd->control ),
"Couldn't create the control thread" );
}
void flexnbd_stop_control( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
NULLCHECK( flexnbd->control );
control_signal_close( flexnbd->control );
pthread_t tid = flexnbd->control->thread;
FATAL_UNLESS( 0 == pthread_join( tid, NULL ),
"Failed joining the control thread" );
debug( "Control thread %p pthread_join returned", tid );
}
int flexnbd_signal_fd( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
return flexnbd->signal_fd;
}
void flexnbd_destroy( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
if ( flexnbd->control ) {
control_destroy( flexnbd->control );
}
close( flexnbd->signal_fd );
free( flexnbd );
}
struct server * flexnbd_server( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
return flexnbd->serve;
}
void flexnbd_replace_acl( struct flexnbd * flexnbd, struct acl * acl )
{
NULLCHECK( flexnbd );
server_replace_acl( flexnbd_server(flexnbd), acl );
}
struct status * flexnbd_status_create( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
struct status * status;
status = status_create( flexnbd_server( flexnbd ) );
return status;
}
void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve )
{
NULLCHECK( flexnbd );
flexnbd->serve = serve;
}
/* Get the default_deny of the current server object. */
int flexnbd_default_deny( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
return server_default_deny( flexnbd->serve );
}
void make_writable( const char * filename )
{
NULLCHECK( filename );
FATAL_IF_NEGATIVE( chmod( filename, S_IWUSR ),
"Couldn't chmod %s: %s",
filename,
strerror( errno ) );
}
int flexnbd_serve( struct flexnbd * flexnbd )
{
NULLCHECK( flexnbd );
int success;
struct self_pipe * open_signal = NULL;
if ( flexnbd->control ){
debug( "Spawning control thread" );
flexnbd_spawn_control( flexnbd );
open_signal = flexnbd->control->open_signal;
}
success = do_serve( flexnbd->serve, open_signal );
debug("do_serve success is %d", success );
if ( flexnbd->control ) {
debug( "Stopping control thread" );
flexnbd_stop_control( flexnbd );
debug("Control thread stopped");
}
return success;
}

View File

@@ -1,65 +0,0 @@
#ifndef FLEXNBD_H
#define FLEXNBD_H
#include "acl.h"
#include "mirror.h"
#include "serve.h"
#include "proxy.h"
#include "self_pipe.h"
#include "mbox.h"
#include "control.h"
#include "flexthread.h"
/* Carries the "globals". */
struct flexnbd {
/* Our serve pointer should never be dereferenced outside a
* flexnbd_switch_lock/unlock pair.
*/
struct server * serve;
/* We only have a control object if a control socket name was
* passed on the command line.
*/
struct control * control;
/* File descriptor for a signalfd(2) signal stream. */
int signal_fd;
};
struct flexnbd * flexnbd_create(void);
struct flexnbd * flexnbd_create_serving(
char* s_ip_address,
char* s_port,
char* s_file,
char* s_ctrl_sock,
int default_deny,
int acl_entries,
char** s_acl_entries,
int max_nbd_clients,
int use_killswitch);
struct flexnbd * flexnbd_create_listening(
char* s_ip_address,
char* s_port,
char* s_file,
char* s_ctrl_sock,
int default_deny,
int acl_entries,
char** s_acl_entries );
void flexnbd_destroy( struct flexnbd * );
enum mirror_state;
enum mirror_state flexnbd_get_mirror_state( struct flexnbd * );
int flexnbd_default_deny( struct flexnbd * );
void flexnbd_set_server( struct flexnbd * flexnbd, struct server * serve );
int flexnbd_signal_fd( struct flexnbd * flexnbd );
int flexnbd_serve( struct flexnbd * flexnbd );
int flexnbd_proxy( struct flexnbd * flexnbd );
struct server * flexnbd_server( struct flexnbd * flexnbd );
void flexnbd_replace_acl( struct flexnbd * flexnbd, struct acl * acl );
struct status * flexnbd_status_create( struct flexnbd * flexnbd );
#endif

View File

@@ -1,75 +0,0 @@
#include "flexthread.h"
#include "util.h"
#include <pthread.h>
struct flexthread_mutex * flexthread_mutex_create(void)
{
struct flexthread_mutex * ftm =
xmalloc( sizeof( struct flexthread_mutex ) );
FATAL_UNLESS( 0 == pthread_mutex_init( &ftm->mutex, NULL ),
"Mutex initialisation failed" );
return ftm;
}
void flexthread_mutex_destroy( struct flexthread_mutex * ftm )
{
NULLCHECK( ftm );
if( flexthread_mutex_held( ftm ) ) {
flexthread_mutex_unlock( ftm );
}
else if ( (pthread_t)NULL != ftm->holder ) {
/* This "should never happen": if we can try to destroy
* a mutex currently held by another thread, there's a
* logic bug somewhere. I know the test here is racy,
* but there's not a lot we can do about it at this
* point.
*/
fatal( "Attempted to destroy a flexthread_mutex"\
" held by another thread!" );
}
FATAL_UNLESS( 0 == pthread_mutex_destroy( &ftm->mutex ),
"Mutex destroy failed" );
free( ftm );
}
int flexthread_mutex_lock( struct flexthread_mutex * ftm )
{
NULLCHECK( ftm );
int failure = pthread_mutex_lock( &ftm->mutex );
if ( 0 == failure ) {
ftm->holder = pthread_self();
}
return failure;
}
int flexthread_mutex_unlock( struct flexthread_mutex * ftm )
{
NULLCHECK( ftm );
pthread_t orig = ftm->holder;
ftm->holder = (pthread_t)NULL;
int failure = pthread_mutex_unlock( &ftm->mutex );
if ( 0 != failure ) {
ftm->holder = orig;
}
return failure;
}
int flexthread_mutex_held( struct flexthread_mutex * ftm )
{
NULLCHECK( ftm );
return pthread_self() == ftm->holder;
}

View File

@@ -1,350 +0,0 @@
#include <sys/mman.h>
#include <sys/sendfile.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <linux/fs.h>
#include <linux/fiemap.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include "util.h"
#include "bitset.h"
#include "ioutil.h"
int build_allocation_map(struct bitset * allocation_map, int fd)
{
/* break blocking ioctls down */
const unsigned long max_length = 100*1024*1024;
const unsigned int max_extents = 1000;
unsigned long offset = 0;
struct {
struct fiemap fiemap;
struct fiemap_extent extents[max_extents];
} fiemap_static;
struct fiemap* fiemap = (struct fiemap*) &fiemap_static;
memset(&fiemap_static, 0, sizeof(fiemap_static));
for (offset = 0; offset < allocation_map->size; ) {
unsigned int i;
fiemap->fm_start = offset;
fiemap->fm_length = max_length;
if ( offset + max_length > allocation_map->size ) {
fiemap->fm_length = allocation_map->size-offset;
}
fiemap->fm_flags = FIEMAP_FLAG_SYNC;
fiemap->fm_extent_count = max_extents;
fiemap->fm_mapped_extents = 0;
if ( ioctl( fd, FS_IOC_FIEMAP, fiemap ) < 0 ) {
debug( "Couldn't get fiemap, returning no allocation_map" );
return 0; /* it's up to the caller to free the map */
}
else {
for ( i = 0; i < fiemap->fm_mapped_extents; i++ ) {
bitset_set_range( allocation_map,
fiemap->fm_extents[i].fe_logical,
fiemap->fm_extents[i].fe_length );
}
/* must move the offset on, but careful not to jump max_length
* if we've actually hit max_offsets.
*/
if (fiemap->fm_mapped_extents > 0) {
struct fiemap_extent *last = &fiemap->fm_extents[
fiemap->fm_mapped_extents-1
];
offset = last->fe_logical + last->fe_length;
}
else {
offset += fiemap->fm_length;
}
}
}
debug("Successfully built allocation map");
return 1;
}
int open_and_mmap(const char* filename, int* out_fd, off64_t *out_size, void **out_map)
{
off64_t size;
/* O_DIRECT seems to be intermittently supported. Leaving it as
* a compile-time option for now. */
#ifdef DIRECT_IO
*out_fd = open(filename, O_RDWR | O_DIRECT | O_SYNC );
#else
*out_fd = open(filename, O_RDWR | O_SYNC );
#endif
if (*out_fd < 1) {
warn("open(%s) failed: does it exist?", filename);
return *out_fd;
}
size = lseek64(*out_fd, 0, SEEK_END);
if (size < 0) {
warn("lseek64() failed");
return size;
}
if (out_size) {
*out_size = size;
}
if (out_map) {
*out_map = mmap64(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED,
*out_fd, 0);
if (((long) *out_map) == -1) {
warn("mmap64() failed");
return -1;
}
}
debug("opened %s size %ld on fd %d @ %p", filename, size, *out_fd, *out_map);
return 0;
}
int writeloop(int filedes, const void *buffer, size_t size)
{
size_t written=0;
while (written < size) {
ssize_t result = write(filedes, buffer+written, size-written);
if (result == -1) {
if ( errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK ) {
continue; // busy-wait
}
return -1; // failure
}
written += result;
}
return 0;
}
int readloop(int filedes, void *buffer, size_t size)
{
size_t readden=0;
while (readden < size) {
ssize_t result = read(filedes, buffer+readden, size-readden);
if ( result == 0 /* EOF */ ) {
warn( "end-of-file detected while reading" );
return -1;
}
if ( result == -1 ) {
if ( errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK ) {
continue; // busy-wait
}
return -1; // failure
}
readden += result;
}
return 0;
}
int sendfileloop(int out_fd, int in_fd, off64_t *offset, size_t count)
{
size_t sent=0;
while (sent < count) {
ssize_t result = sendfile64(out_fd, in_fd, offset, count-sent);
debug("sendfile64(out_fd=%d, in_fd=%d, offset=%p, count-sent=%ld) = %ld", out_fd, in_fd, offset, count-sent, result);
if (result == -1) {
debug( "%s (%i) calling sendfile64()", strerror(errno), errno );
return -1;
}
sent += result;
debug("sent=%ld, count=%ld", sent, count);
}
debug("exiting sendfileloop");
return 0;
}
#include <errno.h>
ssize_t spliceloop(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags2)
{
const unsigned int flags = SPLICE_F_MORE|SPLICE_F_MOVE|flags2;
size_t spliced=0;
//debug("spliceloop(%d, %ld, %d, %ld, %ld)", fd_in, off_in ? *off_in : 0, fd_out, off_out ? *off_out : 0, len);
while (spliced < len) {
ssize_t result = splice(fd_in, off_in, fd_out, off_out, len, flags);
if (result < 0) {
//debug("result=%ld (%s), spliced=%ld, len=%ld", result, strerror(errno), spliced, len);
if (errno == EAGAIN && (flags & SPLICE_F_NONBLOCK) ) {
return spliced;
}
else {
return -1;
}
} else {
spliced += result;
//debug("result=%ld (%s), spliced=%ld, len=%ld", result, strerror(errno), spliced, len);
}
}
return spliced;
}
int splice_via_pipe_loop(int fd_in, int fd_out, size_t len)
{
int pipefd[2]; /* read end, write end */
size_t spliced=0;
if (pipe(pipefd) == -1) {
return -1;
}
while (spliced < len) {
ssize_t run = len-spliced;
ssize_t s2, s1 = spliceloop(fd_in, NULL, pipefd[1], NULL, run, SPLICE_F_NONBLOCK);
/*if (run > 65535)
run = 65535;*/
if (s1 < 0) { break; }
s2 = spliceloop(pipefd[0], NULL, fd_out, NULL, s1, 0);
if (s2 < 0) { break; }
spliced += s2;
}
close(pipefd[0]);
close(pipefd[1]);
return spliced < len ? -1 : 0;
}
/* Reads single bytes from fd until either an EOF or a newline appears.
* If an EOF occurs before a newline, returns -1. The line is lost.
* Inserts the read bytes (without the newline) into buf, followed by a
* trailing NULL.
* Returns the number of read bytes: the length of the line without the
* newline, plus the trailing null.
*/
int read_until_newline(int fd, char* buf, int bufsize)
{
int cur;
for (cur=0; cur < bufsize; cur++) {
int result = read(fd, buf+cur, 1);
if (result <= 0) { return -1; }
if (buf[cur] == 10) {
buf[cur] = '\0';
break;
}
}
return cur+1;
}
int read_lines_until_blankline(int fd, int max_line_length, char ***lines)
{
int lines_count = 0;
char line[max_line_length+1];
*lines = NULL;
memset(line, 0, max_line_length+1);
while (1) {
int readden = read_until_newline(fd, line, max_line_length);
/* readden will be:
* 1 for an empty line
* -1 for an eof
* -1 for a read error
*/
if (readden <= 1) { return lines_count; }
*lines = xrealloc(*lines, (lines_count+1) * sizeof(char*));
(*lines)[lines_count] = strdup(line);
if ((*lines)[lines_count][0] == 0) {
return lines_count;
}
lines_count++;
}
}
int fd_is_closed( int fd_in )
{
int errno_old = errno;
int result = fcntl( fd_in, F_GETFL ) < 0;
errno = errno_old;
return result;
}
static inline int io_errno_permanent(void)
{
return ( errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR );
}
/* Returns -1 if the operation failed, or the number of bytes read if all is
* well. Note that 0 bytes may be returned. Unlike read(), this is not an EOF! */
ssize_t iobuf_read(int fd, struct iobuf *iobuf, size_t default_size )
{
size_t left;
ssize_t count;
if ( iobuf->needle == 0 ) {
iobuf->size = default_size;
}
left = iobuf->size - iobuf->needle;
debug( "Reading %"PRIu32" of %"PRIu32" bytes from fd %i", left, iobuf->size, fd );
count = read( fd, iobuf->buf + iobuf->needle, left );
if ( count > 0 ) {
iobuf->needle += count;
debug( "read() returned %"PRIu32" bytes", count );
} else if ( count == 0 ) {
warn( "read() returned EOF on fd %i", fd );
errno = 0;
return -1;
} else if ( count == -1 ) {
if ( io_errno_permanent() ) {
warn( SHOW_ERRNO( "read() failed on fd %i", fd ) );
} else {
debug( SHOW_ERRNO( "read() returned 0 bytes" ) );
count = 0;
}
}
return count;
}
ssize_t iobuf_write( int fd, struct iobuf *iobuf )
{
size_t left = iobuf->size - iobuf->needle;
ssize_t count;
debug( "Writing %"PRIu32" of %"PRIu32" bytes to fd %i", left, iobuf->size, fd );
count = write( fd, iobuf->buf + iobuf->needle, left );
if ( count >= 0 ) {
iobuf->needle += count;
debug( "write() returned %"PRIu32" bytes", count );
} else {
if ( io_errno_permanent() ) {
warn( SHOW_ERRNO( "write() failed on fd %i", fd ) );
} else {
debug( SHOW_ERRNO( "write() returned 0 bytes" ) );
count = 0;
}
}
return count;
}

View File

@@ -2,17 +2,20 @@
#include "mode.h"
#include <signal.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char** argv)
int main(int argc, char **argv)
{
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */
error_init();
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */
error_init();
if (argc < 2) {
exit_err( help_help_text );
}
mode(argv[1], argc-1, argv+1); /* never returns */
srand(time(NULL));
return 0;
if (argc < 2) {
exit_err(help_help_text);
}
mode(argv[1], argc - 1, argv + 1); /* never returns */
return 0;
}

View File

@@ -1,77 +0,0 @@
#include "mbox.h"
#include "util.h"
#include <pthread.h>
struct mbox * mbox_create( void )
{
struct mbox * mbox = xmalloc( sizeof( struct mbox ) );
FATAL_UNLESS( 0 == pthread_cond_init( &mbox->filled_cond, NULL ),
"Failed to initialise a condition variable" );
FATAL_UNLESS( 0 == pthread_cond_init( &mbox->emptied_cond, NULL ),
"Failed to initialise a condition variable" );
FATAL_UNLESS( 0 == pthread_mutex_init( &mbox->mutex, NULL ),
"Failed to initialise a mutex" );
return mbox;
}
void mbox_post( struct mbox * mbox, void * contents )
{
pthread_mutex_lock( &mbox->mutex );
{
if (mbox->full){
pthread_cond_wait( &mbox->emptied_cond, &mbox->mutex );
}
mbox->contents = contents;
mbox->full = 1;
while( 0 != pthread_cond_signal( &mbox->filled_cond ) );
}
pthread_mutex_unlock( &mbox->mutex );
}
void * mbox_contents( struct mbox * mbox )
{
return mbox->contents;
}
int mbox_is_full( struct mbox * mbox )
{
return mbox->full;
}
void * mbox_receive( struct mbox * mbox )
{
NULLCHECK( mbox );
void * result;
pthread_mutex_lock( &mbox->mutex );
{
if ( !mbox->full ) {
pthread_cond_wait( &mbox->filled_cond, &mbox->mutex );
}
mbox->full = 0;
result = mbox->contents;
mbox->contents = NULL;
while( 0 != pthread_cond_signal( &mbox->emptied_cond));
}
pthread_mutex_unlock( &mbox->mutex );
return result;
}
void mbox_destroy( struct mbox * mbox )
{
NULLCHECK( mbox );
while( 0 != pthread_cond_destroy( &mbox->emptied_cond ) );
while( 0 != pthread_cond_destroy( &mbox->filled_cond ) );
while( 0 != pthread_mutex_destroy( &mbox->mutex ) );
free( mbox );
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,128 +0,0 @@
#ifndef MIRROR_H
#define MIRROR_H
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
#include "bitset.h"
#include "self_pipe.h"
enum mirror_state;
#include "serve.h"
#include "mbox.h"
/* MS_CONNECT_TIME_SECS
* The length of time after which the sender will assume a connect() to
* the destination has failed.
*/
#define MS_CONNECT_TIME_SECS 60
/* MS_HELLO_TIME_SECS
* The length of time the sender will wait for the NBD hello message
* after connect() before aborting the connection attempt.
*/
#define MS_HELLO_TIME_SECS 5
/* MS_RETRY_DELAY_SECS
* The delay after a failed migration attempt before launching another
* thread to try again.
*/
#define MS_RETRY_DELAY_SECS 1
/* MS_REQUEST_LIMIT_SECS
* We must receive a reply to a request within this time. For a read
* request, this is the time between the end of the NBD request and the
* start of the NBD reply. For a write request, this is the time
* between the end of the written data and the start of the NBD reply.
*/
#define MS_REQUEST_LIMIT_SECS 4
#define MS_REQUEST_LIMIT_SECS_F 4.0
enum mirror_finish_action {
ACTION_EXIT,
ACTION_UNLINK,
ACTION_NOTHING
};
enum mirror_state {
MS_UNKNOWN,
MS_INIT,
MS_GO,
MS_ABANDONED,
MS_DONE,
MS_FAIL_CONNECT,
MS_FAIL_REJECTED,
MS_FAIL_NO_HELLO,
MS_FAIL_SIZE_MISMATCH
};
struct mirror {
pthread_t thread;
/* Signal to this then join the thread if you want to abandon mirroring */
struct self_pipe * abandon_signal;
union mysockaddr * connect_to;
union mysockaddr * connect_from;
int client;
const char * filename;
/* Limiter, used to restrict migration speed Only dirty bytes (those going
* over the network) are considered */
uint64_t max_bytes_per_second;
enum mirror_finish_action action_at_finish;
char *mapped;
/* We need to send every byte at least once; we do so by */
uint64_t offset;
enum mirror_state commit_state;
/* commit_signal is sent immediately after attempting to connect
* and checking the remote size, whether successful or not.
*/
struct mbox * commit_signal;
/* The time (from monotonic_time_ms()) the migration was started. Can be
* used to calculate bps, etc. */
uint64_t migration_started;
/* Running count of all bytes we've transferred */
uint64_t all_dirty;
};
struct mirror_super {
struct mirror * mirror;
pthread_t thread;
struct mbox * state_mbox;
};
/* We need these declaration to get around circular dependencies in the
* .h's
*/
struct server;
struct flexnbd;
struct mirror_super * mirror_super_create(
const char * filename,
union mysockaddr * connect_to,
union mysockaddr * connect_from,
uint64_t max_Bps,
enum mirror_finish_action action_at_finish,
struct mbox * state_mbox
);
void * mirror_super_runner( void * serve_uncast );
uint64_t mirror_current_bps( struct mirror * mirror );
#endif

View File

@@ -1,887 +0,0 @@
#include "mode.h"
#include "flexnbd.h"
#include <getopt.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
static struct option serve_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_FILE,
GETOPT_SOCK,
GETOPT_DENY,
GETOPT_QUIET,
GETOPT_KILLSWITCH,
GETOPT_VERBOSE,
{0}
};
static char serve_short_options[] = "hl:p:f:s:dk" SOPT_QUIET SOPT_VERBOSE;
static char serve_help_text[] =
"Usage: flexnbd " CMD_SERVE " <options> [<acl address>*]\n\n"
"Serve FILE from ADDR:PORT, with an optional control socket at SOCK.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to serve on.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to serve on.\n"
"\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n"
"\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n"
"\t--" OPT_KILLSWITCH",-k \tKill the server if a request takes 120 seconds.\n"
SOCK_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option listen_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_FILE,
GETOPT_SOCK,
GETOPT_DENY,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char listen_short_options[] = "hl:p:f:s:d" SOPT_QUIET SOPT_VERBOSE;
static char listen_help_text[] =
"Usage: flexnbd " CMD_LISTEN " <options> [<acl_address>*]\n\n"
"Listen for an incoming migration on ADDR:PORT."
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to listen on.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to listen on.\n"
"\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n"
"\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n"
SOCK_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option read_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_FROM,
GETOPT_SIZE,
GETOPT_BIND,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char read_short_options[] = "hl:p:F:S:b:" SOPT_QUIET SOPT_VERBOSE;
static char read_help_text[] =
"Usage: flexnbd " CMD_READ " <options>\n\n"
"Read SIZE bytes from a server at ADDR:PORT to stdout, starting at OFFSET.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to read from.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to read from.\n"
"\t--" OPT_FROM ",-F <OFFSET>\tByte offset to read from.\n"
"\t--" OPT_SIZE ",-S <SIZE>\tBytes to read.\n"
BIND_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option *write_options = read_options;
static char *write_short_options = read_short_options;
static char write_help_text[] =
"Usage: flexnbd " CMD_WRITE" <options>\n\n"
"Write SIZE bytes from stdin to a server at ADDR:PORT, starting at OFFSET.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to write to.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to write to.\n"
"\t--" OPT_FROM ",-F <OFFSET>\tByte offset to write from.\n"
"\t--" OPT_SIZE ",-S <SIZE>\tBytes to write.\n"
BIND_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option acl_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char acl_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char acl_help_text[] =
"Usage: flexnbd " CMD_ACL " <options> [<acl address>+]\n\n"
"Set the access control list for a server with control socket SOCK.\n\n"
HELP_LINE
SOCK_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option mirror_speed_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_MAX_SPEED,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char mirror_speed_short_options[] = "hs:m:" SOPT_QUIET SOPT_VERBOSE;
static char mirror_speed_help_text[] =
"Usage: flexnbd " CMD_MIRROR_SPEED " <options>\n\n"
"Set the maximum speed of a migration from a mirring server listening on SOCK.\n\n"
HELP_LINE
SOCK_LINE
MAX_SPEED_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option mirror_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_UNLINK,
GETOPT_BIND,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char mirror_short_options[] = "hs:l:p:ub:" SOPT_QUIET SOPT_VERBOSE;
static char mirror_help_text[] =
"Usage: flexnbd " CMD_MIRROR " <options>\n\n"
"Start mirroring from the server with control socket SOCK to one at ADDR:PORT.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to mirror to.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to mirror to.\n"
SOCK_LINE
"\t--" OPT_UNLINK ",-u\tUnlink the local file when done.\n"
BIND_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option break_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char break_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char break_help_text[] =
"Usage: flexnbd " CMD_BREAK " <options>\n\n"
"Stop mirroring from the server with control socket SOCK.\n\n"
HELP_LINE
SOCK_LINE
VERBOSE_LINE
QUIET_LINE;
static struct option status_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char status_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char status_help_text[] =
"Usage: flexnbd " CMD_STATUS " <options>\n\n"
"Get the status for a server with control socket SOCK.\n\n"
HELP_LINE
SOCK_LINE
VERBOSE_LINE
QUIET_LINE;
char help_help_text_arr[] =
"Usage: flexnbd <cmd> [cmd options]\n\n"
"Commands:\n"
"\tflexnbd serve\n"
"\tflexnbd listen\n"
"\tflexnbd read\n"
"\tflexnbd write\n"
"\tflexnbd acl\n"
"\tflexnbd mirror\n"
"\tflexnbd mirror-speed\n"
"\tflexnbd break\n"
"\tflexnbd status\n"
"\tflexnbd help\n\n"
"See flexnbd help <cmd> for further info\n";
/* Slightly odd array/pointer pair to stop the compiler from complaining
* about symbol sizes
*/
char * help_help_text = help_help_text_arr;
void do_read(struct mode_readwrite_params* params);
void do_write(struct mode_readwrite_params* params);
void do_remote_command(char* command, char* mode, int argc, char** argv);
void read_serve_param( int c, char **ip_addr, char **ip_port, char **file, char **sock, int *default_deny, int *use_killswitch )
{
switch(c){
case 'h':
fprintf(stdout, "%s\n", serve_help_text );
exit( 0 );
break;
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'f':
*file = optarg;
break;
case 's':
*sock = optarg;
break;
case 'd':
*default_deny = 1;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
case 'k':
*use_killswitch = 1;
break;
default:
exit_err( serve_help_text );
break;
}
}
void read_listen_param( int c,
char **ip_addr,
char **ip_port,
char **file,
char **sock,
int *default_deny )
{
switch(c){
case 'h':
fprintf(stdout, "%s\n", listen_help_text );
exit(0);
break;
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'f':
*file = optarg;
break;
case 's':
*sock = optarg;
break;
case 'd':
*default_deny = 1;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( listen_help_text );
break;
}
}
void read_readwrite_param( int c, char **ip_addr, char **ip_port, char **bind_addr, char **from, char **size, char *err_text )
{
switch(c){
case 'h':
fprintf(stdout, "%s\n", err_text );
exit( 0 );
break;
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'F':
*from = optarg;
break;
case 'S':
*size = optarg;
break;
case 'b':
*bind_addr = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( err_text );
break;
}
}
void read_sock_param( int c, char **sock, char *help_text )
{
switch(c){
case 'h':
fprintf( stdout, "%s\n", help_text );
exit( 0 );
break;
case 's':
*sock = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( help_text );
break;
}
}
void read_acl_param( int c, char **sock )
{
read_sock_param( c, sock, acl_help_text );
}
void read_mirror_speed_param(
int c,
char **sock,
char **max_speed
)
{
switch( c ) {
case 'h':
fprintf( stdout, "%s\n", mirror_speed_help_text );
exit( 0 );
break;
case 's':
*sock = optarg;
break;
case 'm':
*max_speed = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( mirror_speed_help_text );
break;
}
}
void read_mirror_param(
int c,
char **sock,
char **ip_addr,
char **ip_port,
int *unlink,
char **bind_addr )
{
switch( c ){
case 'h':
fprintf( stdout, "%s\n", mirror_help_text );
exit( 0 );
break;
case 's':
*sock = optarg;
break;
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'u':
*unlink = 1;
break;
case 'b':
*bind_addr = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( mirror_help_text );
break;
}
}
void read_break_param( int c, char **sock )
{
switch( c ) {
case 'h':
fprintf( stdout, "%s\n", break_help_text );
exit( 0 );
break;
case 's':
*sock = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( break_help_text );
break;
}
}
void read_status_param( int c, char **sock )
{
read_sock_param( c, sock, status_help_text );
}
int mode_serve( int argc, char *argv[] )
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *file = NULL;
char *sock = NULL;
int default_deny = 0; // not on by default
int use_killswitch = 0;
int err = 0;
int success;
struct flexnbd * flexnbd;
while (1) {
c = getopt_long(argc, argv, serve_short_options, serve_options, NULL);
if ( c == -1 ) { break; }
read_serve_param( c, &ip_addr, &ip_port, &file, &sock, &default_deny, &use_killswitch );
}
if ( NULL == ip_addr || NULL == ip_port ) {
err = 1;
fprintf( stderr, "both --addr and --port are required.\n" );
}
if ( NULL == file ) {
err = 1;
fprintf( stderr, "--file is required\n" );
}
if ( err ) { exit_err( serve_help_text ); }
flexnbd = flexnbd_create_serving( ip_addr, ip_port, file, sock, default_deny, argc - optind, argv + optind, MAX_NBD_CLIENTS, use_killswitch );
info( "Serving file %s", file );
success = flexnbd_serve( flexnbd );
flexnbd_destroy( flexnbd );
return success ? 0 : 1;
}
int mode_listen( int argc, char *argv[] )
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *file = NULL;
char *sock = NULL;
int default_deny = 0; // not on by default
int err = 0;
int success;
struct flexnbd * flexnbd;
while (1) {
c = getopt_long(argc, argv, listen_short_options, listen_options, NULL);
if ( c == -1 ) { break; }
read_listen_param( c, &ip_addr, &ip_port,
&file, &sock, &default_deny );
}
if ( NULL == ip_addr || NULL == ip_port ) {
err = 1;
fprintf( stderr, "both --addr and --port are required.\n" );
}
if ( NULL == file ) {
err = 1;
fprintf( stderr, "--file is required\n" );
}
if ( err ) { exit_err( listen_help_text ); }
flexnbd = flexnbd_create_listening(
ip_addr,
ip_port,
file,
sock,
default_deny,
argc - optind,
argv + optind);
success = flexnbd_serve( flexnbd );
flexnbd_destroy( flexnbd );
return success ? 0 : 1;
}
/* TODO: Separate this function.
* It should be:
* params_read( struct mode_readwrite_params* out,
* char *s_ip_address,
* char *s_port,
* char *s_from,
* char *s_length )
* params_write( struct mode_readwrite_params* out,
* char *s_ip_address,
* char *s_port,
* char *s_from,
* char *s_length,
* char *s_filename )
*/
void params_readwrite(
int write_not_read,
struct mode_readwrite_params* out,
char* s_ip_address,
char* s_port,
char* s_bind_address,
char* s_from,
char* s_length_or_filename
)
{
FATAL_IF_NULL(s_ip_address, "No IP address supplied");
FATAL_IF_NULL(s_port, "No port number supplied");
FATAL_IF_NULL(s_from, "No from supplied");
FATAL_IF_NULL(s_length_or_filename, "No length supplied");
FATAL_IF_ZERO(
parse_ip_to_sockaddr(&out->connect_to.generic, s_ip_address),
"Couldn't parse connection address '%s'",
s_ip_address
);
if (s_bind_address != NULL &&
parse_ip_to_sockaddr(&out->connect_from.generic, s_bind_address) == 0) {
fatal("Couldn't parse bind address '%s'", s_bind_address);
}
parse_port( s_port, &out->connect_to.v4 );
out->from = atol(s_from);
if (write_not_read) {
if (s_length_or_filename[0]-48 < 10) {
out->len = atol(s_length_or_filename);
out->data_fd = 0;
}
else {
out->data_fd = open(
s_length_or_filename, O_RDONLY);
FATAL_IF_NEGATIVE(out->data_fd,
"Couldn't open %s", s_length_or_filename);
out->len = lseek64(out->data_fd, 0, SEEK_END);
FATAL_IF_NEGATIVE(out->len,
"Couldn't find length of %s", s_length_or_filename);
FATAL_IF_NEGATIVE(
lseek64(out->data_fd, 0, SEEK_SET),
"Couldn't rewind %s", s_length_or_filename
);
}
}
else {
out->len = atol(s_length_or_filename);
out->data_fd = 1;
}
}
int mode_read( int argc, char *argv[] )
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *bind_addr = NULL;
char *from = NULL;
char *size = NULL;
int err = 0;
struct mode_readwrite_params readwrite;
while (1){
c = getopt_long(argc, argv, read_short_options, read_options, NULL);
if ( c == -1 ) { break; }
read_readwrite_param( c, &ip_addr, &ip_port, &bind_addr, &from, &size, read_help_text );
}
if ( NULL == ip_addr || NULL == ip_port ) {
err = 1;
fprintf( stderr, "both --addr and --port are required.\n" );
}
if ( NULL == from || NULL == size ) {
err = 1;
fprintf( stderr, "both --from and --size are required.\n" );
}
if ( err ) { exit_err( read_help_text ); }
memset( &readwrite, 0, sizeof( readwrite ) );
params_readwrite( 0, &readwrite, ip_addr, ip_port, bind_addr, from, size );
do_read( &readwrite );
return 0;
}
int mode_write( int argc, char *argv[] )
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *bind_addr = NULL;
char *from = NULL;
char *size = NULL;
int err = 0;
struct mode_readwrite_params readwrite;
while (1){
c = getopt_long(argc, argv, write_short_options, write_options, NULL);
if ( c == -1 ) { break; }
read_readwrite_param( c, &ip_addr, &ip_port, &bind_addr, &from, &size, write_help_text );
}
if ( NULL == ip_addr || NULL == ip_port ) {
err = 1;
fprintf( stderr, "both --addr and --port are required.\n" );
}
if ( NULL == from || NULL == size ) {
err = 1;
fprintf( stderr, "both --from and --size are required.\n" );
}
if ( err ) { exit_err( write_help_text ); }
memset( &readwrite, 0, sizeof( readwrite ) );
params_readwrite( 1, &readwrite, ip_addr, ip_port, bind_addr, from, size );
do_write( &readwrite );
return 0;
}
int mode_acl( int argc, char *argv[] )
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long( argc, argv, acl_short_options, acl_options, NULL );
if ( c == -1 ) { break; }
read_acl_param( c, &sock );
}
if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" );
exit_err( acl_help_text );
}
/* Don't use the CMD_ACL macro here, "acl" is the remote command
* name, not the cli option
*/
do_remote_command( "acl", sock, argc - optind, argv + optind );
return 0;
}
int mode_mirror_speed( int argc, char *argv[] )
{
int c;
char *sock = NULL;
char *speed = NULL;
while( 1 ) {
c = getopt_long( argc, argv, mirror_speed_short_options, mirror_speed_options, NULL );
if ( -1 == c ) { break; }
read_mirror_speed_param( c, &sock, &speed );
}
if ( NULL == sock ) {
fprintf( stderr, "--sock is required.\n" );
exit_err( mirror_speed_help_text );
}
if ( NULL == speed ) {
fprintf( stderr, "--max-speed is required.\n");
exit_err( mirror_speed_help_text );
}
do_remote_command( "mirror_max_bps", sock, 1, &speed );
return 0;
}
int mode_mirror( int argc, char *argv[] )
{
int c;
char *sock = NULL;
char *remote_argv[4] = {0};
int err = 0;
int unlink = 0;
remote_argv[2] = "exit";
while (1) {
c = getopt_long( argc, argv, mirror_short_options, mirror_options, NULL);
if ( -1 == c ) { break; }
read_mirror_param( c,
&sock,
&remote_argv[0],
&remote_argv[1],
&unlink,
&remote_argv[3] );
}
if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" );
err = 1;
}
if ( NULL == remote_argv[0] || NULL == remote_argv[1] ) {
fprintf( stderr, "both --addr and --port are required.\n");
err = 1;
}
if ( err ) { exit_err( mirror_help_text ); }
if ( unlink ) { remote_argv[2] = "unlink"; }
if (remote_argv[3] == NULL) {
do_remote_command( "mirror", sock, 3, remote_argv );
}
else {
do_remote_command( "mirror", sock, 4, remote_argv );
}
return 0;
}
int mode_break( int argc, char *argv[] )
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long( argc, argv, break_short_options, break_options, NULL );
if ( -1 == c ) { break; }
read_break_param( c, &sock );
}
if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" );
exit_err( acl_help_text );
}
do_remote_command( "break", sock, argc - optind, argv + optind );
return 0;
}
int mode_status( int argc, char *argv[] )
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long( argc, argv, status_short_options, status_options, NULL );
if ( -1 == c ) { break; }
read_status_param( c, &sock );
}
if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" );
exit_err( acl_help_text );
}
do_remote_command( "status", sock, argc - optind, argv + optind );
return 0;
}
int mode_help( int argc, char *argv[] )
{
char *cmd;
char *help_text = NULL;
if ( argc < 1 ){
help_text = help_help_text;
} else {
cmd = argv[0];
if (IS_CMD( CMD_SERVE, cmd ) ) {
help_text = serve_help_text;
} else if ( IS_CMD( CMD_LISTEN, cmd ) ) {
help_text = listen_help_text;
} else if ( IS_CMD( CMD_READ, cmd ) ) {
help_text = read_help_text;
} else if ( IS_CMD( CMD_WRITE, cmd ) ) {
help_text = write_help_text;
} else if ( IS_CMD( CMD_ACL, cmd ) ) {
help_text = acl_help_text;
} else if ( IS_CMD( CMD_MIRROR, cmd ) ) {
help_text = mirror_help_text;
} else if ( IS_CMD( CMD_STATUS, cmd ) ) {
help_text = status_help_text;
} else { exit_err( help_help_text ); }
}
fprintf( stdout, "%s\n", help_text );
return 0;
}
void mode(char* mode, int argc, char **argv)
{
if ( IS_CMD( CMD_SERVE, mode ) ) {
exit( mode_serve( argc, argv ) );
}
else if ( IS_CMD( CMD_LISTEN, mode ) ) {
exit( mode_listen( argc, argv ) );
}
else if ( IS_CMD( CMD_READ, mode ) ) {
mode_read( argc, argv );
}
else if ( IS_CMD( CMD_WRITE, mode ) ) {
mode_write( argc, argv );
}
else if ( IS_CMD( CMD_ACL, mode ) ) {
mode_acl( argc, argv );
} else if ( IS_CMD ( CMD_MIRROR_SPEED, mode ) ) {
mode_mirror_speed( argc, argv );
}
else if ( IS_CMD( CMD_MIRROR, mode ) ) {
mode_mirror( argc, argv );
}
else if ( IS_CMD( CMD_BREAK, mode ) ) {
mode_break( argc, argv );
}
else if ( IS_CMD( CMD_STATUS, mode ) ) {
mode_status( argc, argv );
}
else if ( IS_CMD( CMD_HELP, mode ) ) {
mode_help( argc-1, argv+1 );
}
else {
mode_help( argc-1, argv+1 );
exit( 1 );
}
exit(0);
}

View File

@@ -1,58 +0,0 @@
#include "nbdtypes.h"
#include <string.h>
#include <endian.h>
/**
* We intentionally ignore the reserved 128 bytes at the end of the
* request, since there's nothing we can do with them.
*/
void nbd_r2h_init( struct nbd_init_raw * from, struct nbd_init * to )
{
memcpy( to->passwd, from->passwd, 8 );
to->magic = be64toh( from->magic );
to->size = be64toh( from->size );
}
void nbd_h2r_init( struct nbd_init * from, struct nbd_init_raw * to)
{
memcpy( to->passwd, from->passwd, 8 );
to->magic = htobe64( from->magic );
to->size = htobe64( from->size );
}
void nbd_r2h_request( struct nbd_request_raw *from, struct nbd_request * to )
{
to->magic = htobe32( from->magic );
to->type = htobe32( from->type );
memcpy( to->handle, from->handle, 8 );
to->from = htobe64( from->from );
to->len = htobe32( from->len );
}
void nbd_h2r_request( struct nbd_request * from, struct nbd_request_raw * to )
{
to->magic = be32toh( from->magic );
to->type = be32toh( from->type );
memcpy( to->handle, from->handle, 8 );
to->from = be64toh( from->from );
to->len = be32toh( from->len );
}
void nbd_r2h_reply( struct nbd_reply_raw * from, struct nbd_reply * to )
{
to->magic = htobe32( from->magic );
to->error = htobe32( from->error );
memcpy( to->handle, from->handle, 8 );
}
void nbd_h2r_reply( struct nbd_reply * from, struct nbd_reply_raw * to )
{
to->magic = be32toh( from->magic );
to->error = be32toh( from->error );
memcpy( to->handle, from->handle, 8 );
}

View File

@@ -1,85 +0,0 @@
#ifndef __NBDTYPES_H
#define __NBDTYPES_H
/* http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-09/2332.html */
#define INIT_PASSWD "NBDMAGIC"
#define INIT_MAGIC 0x0000420281861253
#define REQUEST_MAGIC 0x25609513
#define REPLY_MAGIC 0x67446698
#define REQUEST_READ 0
#define REQUEST_WRITE 1
#define REQUEST_DISCONNECT 2
/* The top 2 bytes of the type field are overloaded and can contain flags */
#define REQUEST_MASK 0x0000ffff
/* 1MiB is the de-facto standard for maximum size of header + data */
#define NBD_MAX_SIZE ( 1024 * 1024 )
#define NBD_REQUEST_SIZE ( sizeof( struct nbd_request_raw ) )
#define NBD_REPLY_SIZE ( sizeof( struct nbd_reply_raw ) )
#include <linux/types.h>
#include <inttypes.h>
/* The _raw types are the types as they appear on the wire. Non-_raw
* types are in host-format.
* Conversion functions are _r2h_ for converting raw to host, and _h2r_
* for converting host to raw.
*/
struct nbd_init_raw {
char passwd[8];
__be64 magic;
__be64 size;
char reserved[128];
};
struct nbd_request_raw {
__be32 magic;
__be32 type; /* == READ || == WRITE */
char handle[8];
__be64 from;
__be32 len;
} __attribute__((packed));
struct nbd_reply_raw {
__be32 magic;
__be32 error; /* 0 = ok, else error */
char handle[8]; /* handle you got from request */
};
struct nbd_init {
char passwd[8];
uint64_t magic;
uint64_t size;
char reserved[128];
};
struct nbd_request {
uint32_t magic;
uint32_t type; /* == READ || == WRITE || == DISCONNECT */
char handle[8];
uint64_t from;
uint32_t len;
} __attribute__((packed));
struct nbd_reply {
uint32_t magic;
uint32_t error; /* 0 = ok, else error */
char handle[8]; /* handle you got from request */
};
void nbd_r2h_init( struct nbd_init_raw * from, struct nbd_init * to );
void nbd_r2h_request( struct nbd_request_raw *from, struct nbd_request * to );
void nbd_r2h_reply( struct nbd_reply_raw * from, struct nbd_reply * to );
void nbd_h2r_init( struct nbd_init * from, struct nbd_init_raw * to);
void nbd_h2r_request( struct nbd_request * from, struct nbd_request_raw * to );
void nbd_h2r_reply( struct nbd_reply * from, struct nbd_reply_raw * to );
#endif

View File

@@ -1,127 +0,0 @@
#include "parse.h"
#include "util.h"
int atoi(const char *nptr);
#define IS_IP_VALID_CHAR(x) ( ((x) >= '0' && (x) <= '9' ) || \
((x) >= 'a' && (x) <= 'f') || \
((x) >= 'A' && (x) <= 'F' ) || \
(x) == ':' || (x) == '.' \
)
/* FIXME: should change this to return negative on error like everything else */
int parse_ip_to_sockaddr(struct sockaddr* out, char* src)
{
NULLCHECK( out );
NULLCHECK( src );
char temp[64];
struct sockaddr_in *v4 = (struct sockaddr_in *) out;
struct sockaddr_in6 *v6 = (struct sockaddr_in6 *) out;
/* allow user to start with [ and end with any other invalid char */
{
int i=0, j=0;
if (src[i] == '[') { i++; }
for (; i<64 && IS_IP_VALID_CHAR(src[i]); i++) {
temp[j++] = src[i];
}
temp[j] = 0;
}
if (temp[0] == '0' && temp[1] == '\0') {
v4->sin_family = AF_INET;
v4->sin_addr.s_addr = INADDR_ANY;
return 1;
}
if (inet_pton(AF_INET, temp, &v4->sin_addr) == 1) {
out->sa_family = AF_INET;
return 1;
}
if (inet_pton(AF_INET6, temp, &v6->sin6_addr) == 1) {
out->sa_family = AF_INET6;
return 1;
}
return 0;
}
int parse_to_sockaddr(struct sockaddr* out, char* address)
{
struct sockaddr_un* un = (struct sockaddr_un*) out;
NULLCHECK( address );
if ( address[0] == '/' ) {
un->sun_family = AF_UNIX;
strncpy( un->sun_path, address, 108 ); /* FIXME: linux only */
return 1;
}
return parse_ip_to_sockaddr( out, address );
}
int parse_acl(struct ip_and_mask (**out)[], int max, char **entries)
{
struct ip_and_mask* list;
int i;
if (max == 0) {
*out = NULL;
return 0;
}
else {
list = xmalloc(max * sizeof(struct ip_and_mask));
*out = (struct ip_and_mask (*)[])list;
debug("acl alloc: %p", *out);
}
for (i = 0; i < max; i++) {
int j;
struct ip_and_mask* outentry = &list[i];
# define MAX_MASK_BITS (outentry->ip.family == AF_INET ? 32 : 128)
if (parse_ip_to_sockaddr(&outentry->ip.generic, entries[i]) == 0) {
return i;
}
for (j=0; entries[i][j] && entries[i][j] != '/'; j++)
; // increment j!
if (entries[i][j] == '/') {
outentry->mask = atoi(entries[i]+j+1);
if (outentry->mask < 1 || outentry->mask > MAX_MASK_BITS) {
return i;
}
}
else {
outentry->mask = MAX_MASK_BITS;
}
# undef MAX_MASK_BITS
debug("acl ptr[%d]: %p %d",i, outentry, outentry->mask);
}
for (i=0; i < max; i++) {
debug("acl entry %d @ %p has mask %d", i, list[i], list[i].mask);
}
return max;
}
void parse_port( char *s_port, struct sockaddr_in *out )
{
NULLCHECK( s_port );
int raw_port;
raw_port = atoi( s_port );
if ( raw_port < 0 || raw_port > 65535 ) {
fatal( "Port number must be >= 0 and <= 65535" );
}
out->sin_port = htobe16( raw_port );
}

View File

@@ -1,29 +0,0 @@
#ifndef PARSE_H
#define PARSE_H
#include <sys/socket.h>
#include <sys/un.h>
#include <arpa/inet.h>
#include <unistd.h>
union mysockaddr {
unsigned short family;
struct sockaddr generic;
struct sockaddr_in v4;
struct sockaddr_in6 v6;
struct sockaddr_un un;
};
struct ip_and_mask {
union mysockaddr ip;
int mask;
};
int parse_ip_to_sockaddr(struct sockaddr* out, char* src);
int parse_to_sockaddr(struct sockaddr* out, char* src);
int parse_acl(struct ip_and_mask (**out)[], int max, char **entries);
void parse_port( char *s_port, struct sockaddr_in *out );
#endif

View File

@@ -1,14 +0,0 @@
#ifndef PREFETCH_H
#define PREFETCH_H
#define PREFETCH_BUFSIZE 4096
struct prefetch {
int is_full;
__be64 from;
__be32 len;
char buffer[PREFETCH_BUFSIZE];
};
#endif

View File

@@ -1,4 +1,6 @@
#include <signal.h>
#include <stdlib.h>
#include <time.h>
#include "mode.h"
#include "util.h"
@@ -6,152 +8,158 @@
static struct option proxy_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_CONNECT_ADDR,
GETOPT_CONNECT_PORT,
GETOPT_BIND,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_CONNECT_ADDR,
GETOPT_CONNECT_PORT,
GETOPT_BIND,
GETOPT_CACHE,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char proxy_short_options[] = "hl:p:C:P:b:" SOPT_QUIET SOPT_VERBOSE;
static char proxy_help_text[] =
"Usage: flexnbd-proxy <options>\n\n"
"Resiliently proxy an NBD connection between client and server\n"
"We can listen on TCP or UNIX socket, but only connect to TCP servers.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address we will bind to as a proxy.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port we will bind to as a proxy, if required.\n"
"\t--" OPT_CONNECT_ADDR ",-C <ADDR>\tAddress of the proxied server.\n"
"\t--" OPT_CONNECT_PORT ",-P <PORT>\tPort of the proxied server.\n"
"\t--" OPT_BIND ",-b <ADDR>\tThe address we connect from, as a proxy.\n"
QUIET_LINE
VERBOSE_LINE;
"Usage: flexnbd-proxy <options>\n\n"
"Resiliently proxy an NBD connection between client and server\n"
"We can listen on TCP or UNIX socket, but only connect to TCP servers.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address we will bind to as a proxy.\n"
"\t--" OPT_PORT
",-p <PORT>\tThe port we will bind to as a proxy, if required.\n"
"\t--" OPT_CONNECT_ADDR ",-C <ADDR>\tAddress of the proxied server.\n"
"\t--" OPT_CONNECT_PORT ",-P <PORT>\tPort of the proxied server.\n"
"\t--" OPT_BIND
",-b <ADDR>\tThe address we connect from, as a proxy.\n" "\t--"
OPT_CACHE
",-c[=<CACHE-BYTES>]\tUse a RAM read cache of the given size.\n"
QUIET_LINE VERBOSE_LINE;
void read_proxy_param(
int c,
char **downstream_addr,
char **downstream_port,
char **upstream_addr,
char **upstream_port,
char **bind_addr )
static char proxy_default_cache_size[] = "4096";
void read_proxy_param(int c,
char **downstream_addr,
char **downstream_port,
char **upstream_addr,
char **upstream_port,
char **bind_addr, char **cache_bytes)
{
switch( c ) {
case 'h' :
fprintf( stdout, "%s\n", proxy_help_text );
exit( 0 );
break;
case 'l':
*downstream_addr = optarg;
break;
case 'p':
*downstream_port = optarg;
break;
case 'C':
*upstream_addr = optarg;
break;
case 'P':
*upstream_port = optarg;
break;
case 'b':
*bind_addr = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err( proxy_help_text );
break;
}
switch (c) {
case 'h':
fprintf(stdout, "%s\n", proxy_help_text);
exit(0);
case 'l':
*downstream_addr = optarg;
break;
case 'p':
*downstream_port = optarg;
break;
case 'C':
*upstream_addr = optarg;
break;
case 'P':
*upstream_port = optarg;
break;
case 'b':
*bind_addr = optarg;
break;
case 'c':
*cache_bytes = optarg ? optarg : proxy_default_cache_size;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(proxy_help_text);
break;
}
}
struct proxier * proxy = NULL;
struct proxier *proxy = NULL;
void my_exit(int signum)
{
info( "Exit signalled (%i)", signum );
if ( NULL != proxy ) {
proxy_cleanup( proxy );
};
exit( 0 );
info("Exit signalled (%i)", signum);
if (NULL != proxy) {
proxy_cleanup(proxy);
};
exit(0);
}
int main( int argc, char *argv[] )
int main(int argc, char *argv[])
{
int c;
char *downstream_addr = NULL;
char *downstream_port = NULL;
char *upstream_addr = NULL;
char *upstream_port = NULL;
char *bind_addr = NULL;
int success;
int c;
char *downstream_addr = NULL;
char *downstream_port = NULL;
char *upstream_addr = NULL;
char *upstream_port = NULL;
char *bind_addr = NULL;
char *cache_bytes = NULL;
int success;
sigset_t mask;
struct sigaction exit_action;
sigset_t mask;
struct sigaction exit_action;
sigemptyset( &mask );
sigaddset( &mask, SIGTERM );
sigaddset( &mask, SIGQUIT );
sigaddset( &mask, SIGINT );
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGQUIT);
sigaddset(&mask, SIGINT);
exit_action.sa_handler = my_exit;
exit_action.sa_mask = mask;
exit_action.sa_flags = 0;
exit_action.sa_handler = my_exit;
exit_action.sa_mask = mask;
exit_action.sa_flags = 0;
while (1) {
c = getopt_long( argc, argv, proxy_short_options, proxy_options, NULL );
if ( -1 == c ) { break; }
read_proxy_param( c,
&downstream_addr,
&downstream_port,
&upstream_addr,
&upstream_port,
&bind_addr
);
srand(time(NULL));
while (1) {
c = getopt_long(argc, argv, proxy_short_options, proxy_options,
NULL);
if (-1 == c) {
break;
}
read_proxy_param(c,
&downstream_addr,
&downstream_port,
&upstream_addr,
&upstream_port, &bind_addr, &cache_bytes);
}
if ( NULL == downstream_addr ){
fprintf( stderr, "--addr is required.\n" );
exit_err( proxy_help_text );
} else if ( NULL == upstream_addr || NULL == upstream_port ){
fprintf( stderr, "both --conn-addr and --conn-port are required.\n" );
exit_err( proxy_help_text );
}
if (NULL == downstream_addr) {
fprintf(stderr, "--addr is required.\n");
exit_err(proxy_help_text);
} else if (NULL == upstream_addr || NULL == upstream_port) {
fprintf(stderr,
"both --conn-addr and --conn-port are required.\n");
exit_err(proxy_help_text);
}
proxy = proxy_create(
downstream_addr,
downstream_port,
upstream_addr,
upstream_port,
bind_addr
);
proxy = proxy_create(downstream_addr,
downstream_port,
upstream_addr,
upstream_port, bind_addr, cache_bytes);
/* Set these *after* proxy has been assigned to */
sigaction(SIGTERM, &exit_action, NULL);
sigaction(SIGQUIT, &exit_action, NULL);
sigaction(SIGINT, &exit_action, NULL);
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */
/* Set these *after* proxy has been assigned to */
sigaction(SIGTERM, &exit_action, NULL);
sigaction(SIGQUIT, &exit_action, NULL);
sigaction(SIGINT, &exit_action, NULL);
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */
if ( NULL != downstream_port ) {
info(
"Proxying between %s %s (downstream) and %s %s (upstream)",
downstream_addr, downstream_port, upstream_addr, upstream_port
);
} else {
info(
"Proxying between %s (downstream) and %s %s (upstream)",
downstream_addr, upstream_addr, upstream_port
);
}
if (NULL != downstream_port) {
info("Proxying between %s %s (downstream) and %s %s (upstream)",
downstream_addr, downstream_port, upstream_addr,
upstream_port);
} else {
info("Proxying between %s (downstream) and %s %s (upstream)",
downstream_addr, upstream_addr, upstream_port);
}
success = do_proxy( proxy );
proxy_destroy( proxy );
success = do_proxy(proxy);
proxy_destroy(proxy);
return success ? 0 : 1;
return success ? 0 : 1;
}

View File

@@ -1,933 +0,0 @@
#include "proxy.h"
#include "readwrite.h"
#ifdef PREFETCH
#include "prefetch.h"
#endif
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include <errno.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
struct proxier* proxy_create(
char* s_downstream_address,
char* s_downstream_port,
char* s_upstream_address,
char* s_upstream_port,
char* s_upstream_bind )
{
struct proxier* out;
out = xmalloc( sizeof( struct proxier ) );
FATAL_IF_NULL(s_downstream_address, "Listen address not specified");
NULLCHECK( s_downstream_address );
FATAL_UNLESS(
parse_to_sockaddr( &out->listen_on.generic, s_downstream_address ),
"Couldn't parse downstream address %s"
);
if ( out->listen_on.family != AF_UNIX ) {
FATAL_IF_NULL( s_downstream_port, "Downstream port not specified" );
NULLCHECK( s_downstream_port );
parse_port( s_downstream_port, &out->listen_on.v4 );
}
FATAL_IF_NULL(s_upstream_address, "Upstream address not specified");
NULLCHECK( s_upstream_address );
FATAL_UNLESS(
parse_ip_to_sockaddr( &out->connect_to.generic, s_upstream_address ),
"Couldn't parse upstream address '%s'",
s_upstream_address
);
FATAL_IF_NULL( s_upstream_port, "Upstream port not specified" );
NULLCHECK( s_upstream_port );
parse_port( s_upstream_port, &out->connect_to.v4 );
if ( s_upstream_bind ) {
FATAL_IF_ZERO(
parse_ip_to_sockaddr( &out->connect_from.generic, s_upstream_bind ),
"Couldn't parse bind address '%s'",
s_upstream_bind
);
out->bind = 1;
}
out->listen_fd = -1;
out->downstream_fd = -1;
out->upstream_fd = -1;
#ifdef PREFETCH
out->prefetch = xmalloc( sizeof( struct prefetch ) );
#endif
out->init.buf = xmalloc( sizeof( struct nbd_init_raw ) );
out->req.buf = xmalloc( NBD_MAX_SIZE );
out->rsp.buf = xmalloc( NBD_MAX_SIZE );
return out;
}
void proxy_destroy( struct proxier* proxy )
{
free( proxy->init.buf );
free( proxy->req.buf );
free( proxy->rsp.buf );
#ifdef PREFETCH
free( proxy->prefetch );
#endif
free( proxy );
}
/* Shared between our two different connect_to_upstream paths */
void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size );
/* Try to establish a connection to our upstream server. Return 1 on success,
* 0 on failure. this is a blocking call that returns a non-blocking socket.
*/
int proxy_connect_to_upstream( struct proxier* proxy )
{
struct sockaddr* connect_from = NULL;
if ( proxy->bind ) {
connect_from = &proxy->connect_from.generic;
}
int fd = socket_connect( &proxy->connect_to.generic, connect_from );
off64_t size = 0;
if ( -1 == fd ) {
return 0;
}
if( !socket_nbd_read_hello( fd, &size ) ) {
WARN_IF_NEGATIVE(
sock_try_close( fd ),
"Couldn't close() after failed read of NBD hello on fd %i", fd
);
return 0;
}
proxy->upstream_fd = fd;
sock_set_nonblock( fd, 1 );
proxy_finish_connect_to_upstream( proxy, size );
return 1;
}
/* First half of non-blocking connection to upstream. Gets as far as calling
* connect() on a non-blocking socket.
*/
void proxy_start_connect_to_upstream( struct proxier* proxy )
{
int fd, result;
struct sockaddr* from = NULL;
struct sockaddr* to = &proxy->connect_to.generic;
if ( proxy->bind ) {
from = &proxy->connect_from.generic;
}
fd = socket( to->sa_family , SOCK_STREAM, 0 );
if( fd < 0 ) {
warn( SHOW_ERRNO( "Couldn't create socket to reconnect to upstream" ) );
return;
}
info( "Beginning non-blocking connection to upstream on fd %i", fd );
if ( NULL != from ) {
if ( 0 > bind( fd, from, sockaddr_size( from ) ) ) {
warn( SHOW_ERRNO( "bind() to source address failed" ) );
}
}
result = sock_set_nonblock( fd, 1 );
if ( result == -1 ) {
warn( SHOW_ERRNO( "Failed to set upstream fd %i non-blocking", fd ) );
goto error;
}
result = connect( fd, to, sockaddr_size( to ) );
if ( result == -1 && errno != EINPROGRESS ) {
warn( SHOW_ERRNO( "Failed to start connect()ing to upstream!" ) );
goto error;
}
proxy->upstream_fd = fd;
return;
error:
if ( sock_try_close( fd ) == -1 ) {
/* Non-fatal leak, although still nasty */
warn( SHOW_ERRNO( "Failed to close fd for upstream %i", fd ) );
}
return;
}
void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size ) {
if ( proxy->upstream_size == 0 ) {
info( "Size of upstream image is %"PRIu64" bytes", size );
} else if ( proxy->upstream_size != size ) {
warn(
"Size changed from %"PRIu64" to %"PRIu64" bytes",
proxy->upstream_size, size
);
}
proxy->upstream_size = size;
info( "Connected to upstream on fd %i", proxy->upstream_fd );
return;
}
void proxy_disconnect_from_upstream( struct proxier* proxy )
{
if ( -1 != proxy->upstream_fd ) {
info("Closing upstream connection on fd %i", proxy->upstream_fd );
/* TODO: An NBD disconnect would be pleasant here */
WARN_IF_NEGATIVE(
sock_try_close( proxy->upstream_fd ),
"Failed to close() fd %i when disconnecting from upstream",
proxy->upstream_fd
);
proxy->upstream_fd = -1;
}
}
/** Prepares a listening socket for the NBD server, binding etc. */
void proxy_open_listen_socket(struct proxier* params)
{
NULLCHECK( params );
params->listen_fd = socket(params->listen_on.family, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(
params->listen_fd, SHOW_ERRNO( "Couldn't create listen socket" )
);
/* Allow us to restart quickly */
FATAL_IF_NEGATIVE(
sock_set_reuseaddr(params->listen_fd, 1),
SHOW_ERRNO( "Couldn't set SO_REUSEADDR" )
);
if( AF_UNIX != params->listen_on.family ) {
FATAL_IF_NEGATIVE(
sock_set_tcp_nodelay(params->listen_fd, 1),
SHOW_ERRNO( "Couldn't set TCP_NODELAY" )
);
}
FATAL_UNLESS_ZERO(
sock_try_bind( params->listen_fd, &params->listen_on.generic ),
SHOW_ERRNO( "Failed to bind to listening socket" )
);
/* We're only serving one client at a time, hence backlog of 1 */
FATAL_IF_NEGATIVE(
listen(params->listen_fd, 1),
SHOW_ERRNO( "Failed to listen on listening socket" )
);
info( "Now listening for incoming connections" );
return;
}
typedef enum {
EXIT,
WRITE_TO_DOWNSTREAM,
READ_FROM_DOWNSTREAM,
CONNECT_TO_UPSTREAM,
READ_INIT_FROM_UPSTREAM,
WRITE_TO_UPSTREAM,
READ_FROM_UPSTREAM
} proxy_session_states;
static char* proxy_session_state_names[] = {
"EXIT",
"WRITE_TO_DOWNSTREAM",
"READ_FROM_DOWNSTREAM",
"CONNECT_TO_UPSTREAM",
"READ_INIT_FROM_UPSTREAM",
"WRITE_TO_UPSTREAM",
"READ_FROM_UPSTREAM"
};
static inline int proxy_state_upstream( int state )
{
return state == CONNECT_TO_UPSTREAM || state == READ_INIT_FROM_UPSTREAM ||
state == WRITE_TO_UPSTREAM || state == READ_FROM_UPSTREAM;
}
#ifdef PREFETCH
int proxy_prefetch_for_request( struct proxier* proxy, int state )
{
struct nbd_request* req = &proxy->req_hdr;
struct nbd_reply* rsp = &proxy->rsp_hdr;
struct nbd_request_raw* req_raw = (struct nbd_request_raw*) proxy->req.buf;
struct nbd_reply_raw *rsp_raw = (struct nbd_reply_raw*) proxy->rsp.buf;
int is_read = ( req->type & REQUEST_MASK ) == REQUEST_READ;
int prefetch_start = req->from;
int prefetch_end = req->from + ( req->len * 2 );
/* We only want to consider prefetching if we know we're not
* getting too much data back, if it's a read request, and if
* the prefetch won't try to read past the end of the file.
*/
int prefetching = req->len <= PREFETCH_BUFSIZE && is_read &&
prefetch_start < prefetch_end && prefetch_end <= proxy->upstream_size;
if ( is_read ) {
/* See if we can respond with what's in our prefetch
* cache */
if ( proxy->prefetch->is_full &&
req->from == proxy->prefetch->from &&
req->len == proxy->prefetch->len ) {
/* HUZZAH! A match! */
debug( "Prefetch hit!" );
/* First build a reply header */
rsp->magic = REPLY_MAGIC;
rsp->error = 0;
memcpy( &rsp->handle, &req->handle, 8 );
/* now copy it into the response */
nbd_h2r_reply( rsp, rsp_raw );
/* and the data */
memcpy(
proxy->rsp.buf + NBD_REPLY_SIZE,
proxy->prefetch->buffer, proxy->prefetch->len
);
proxy->rsp.size = NBD_REPLY_SIZE + proxy->prefetch->len;
proxy->rsp.needle = 0;
/* return early, our work here is done */
return WRITE_TO_DOWNSTREAM;
}
}
else {
/* Safety catch. If we're sending a write request, we
* blow away the cache. This is very pessimistic, but
* it's simpler (and therefore safer) than working out
* whether we can keep it or not.
*/
debug( "Blowing away prefetch cache on type %d request.", req->type );
proxy->prefetch->is_full = 0;
}
debug( "Prefetch cache MISS!");
/* We pull the request out of the proxy struct, rewrite the
* request size, and write it back.
*/
if ( prefetching ) {
proxy->is_prefetch_req = 1;
proxy->prefetch_req_orig_len = req->len;
req->len *= 2;
debug( "Prefetching %"PRIu32" bytes", req->len - proxy->prefetch_req_orig_len );
nbd_h2r_request( req, req_raw );
}
return state;
}
int proxy_prefetch_for_reply( struct proxier* proxy, int state )
{
size_t prefetched_bytes;
if ( !proxy->is_prefetch_req ) {
return state;
}
prefetched_bytes = proxy->req_hdr.len - proxy->prefetch_req_orig_len;
debug( "Prefetched %d bytes", prefetched_bytes );
memcpy(
proxy->rsp.buf + proxy->prefetch_req_orig_len,
&(proxy->prefetch->buffer),
prefetched_bytes
);
proxy->prefetch->from = proxy->req_hdr.from + proxy->prefetch_req_orig_len;
proxy->prefetch->len = prefetched_bytes;
/* We've finished with proxy->req by now, so don't need to alter it to make
* it look like the request was before prefetch */
/* Truncate the bytes we'll write downstream */
proxy->req_hdr.len = proxy->prefetch_req_orig_len;
proxy->rsp.size -= prefetched_bytes;
/* And we need to reset these */
proxy->prefetch->is_full = 1;
proxy->is_prefetch_req = 0;
return state;
}
#endif
int proxy_read_from_downstream( struct proxier *proxy, int state )
{
ssize_t count;
struct nbd_request_raw* request_raw = (struct nbd_request_raw*) proxy->req.buf;
struct nbd_request* request = &(proxy->req_hdr);
// assert( state == READ_FROM_DOWNSTREAM );
count = iobuf_read( proxy->downstream_fd, &proxy->req, NBD_REQUEST_SIZE );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Couldn't read request from downstream" ) );
return EXIT;
}
if ( proxy->req.needle == NBD_REQUEST_SIZE ) {
nbd_r2h_request( request_raw, request );
if ( ( request->type & REQUEST_MASK ) == REQUEST_DISCONNECT ) {
info( "Received disconnect request from client" );
return EXIT;
}
/* Simple validations */
if ( ( request->type & REQUEST_MASK ) == REQUEST_READ ) {
if (request->len > ( NBD_MAX_SIZE - NBD_REPLY_SIZE ) ) {
warn( "NBD read request size %"PRIu32" too large", request->len );
return EXIT;
}
}
if ( (request->type & REQUEST_MASK ) == REQUEST_WRITE ) {
if (request->len > ( NBD_MAX_SIZE - NBD_REQUEST_SIZE ) ) {
warn( "NBD write request size %"PRIu32" too large", request->len );
return EXIT;
}
proxy->req.size += request->len;
}
}
if ( proxy->req.needle == proxy->req.size ) {
debug(
"Received NBD request from downstream. type=%"PRIu32" from=%"PRIu64" len=%"PRIu32,
request->type, request->from, request->len
);
/* Finished reading, so advance state. Leave size untouched so the next
* state knows how many bytes to write */
proxy->req.needle = 0;
return WRITE_TO_UPSTREAM;
}
return state;
}
int proxy_continue_connecting_to_upstream( struct proxier* proxy, int state )
{
int error, result;
socklen_t len = sizeof( error );
// assert( state == CONNECT_TO_UPSTREAM );
result = getsockopt(
proxy->upstream_fd, SOL_SOCKET, SO_ERROR, &error, &len
);
if ( result == -1 ) {
warn( SHOW_ERRNO( "Failed to tell if connected to upstream" ) );
return state;
}
if ( error != 0 ) {
errno = error;
warn( SHOW_ERRNO( "Failed to connect to upstream" ) );
return state;
}
#ifdef PREFETCH
/* Data may have changed while we were disconnected */
proxy->prefetch->is_full = 0;
#endif
info( "Connected to upstream on fd %i", proxy->upstream_fd );
return READ_INIT_FROM_UPSTREAM;
}
int proxy_read_init_from_upstream( struct proxier* proxy, int state )
{
ssize_t count;
// assert( state == READ_INIT_FROM_UPSTREAM );
count = iobuf_read( proxy->upstream_fd, &proxy->init, sizeof( struct nbd_init_raw ) );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to read init from upstream" ) );
goto disconnect;
}
if ( proxy->init.needle == proxy->init.size ) {
off64_t upstream_size;
if ( !nbd_check_hello( (struct nbd_init_raw*) proxy->init.buf, &upstream_size ) ) {
warn( "Upstream sent invalid init" );
goto disconnect;
}
/* Currently, we only get disconnected from upstream (so needing to come
* here) when we have an outstanding request. If that becomes false,
* we'll need to choose the right state to return to here */
proxy->init.needle = 0;
return WRITE_TO_UPSTREAM;
}
return state;
disconnect:
proxy->init.needle = 0;
proxy->init.size = 0;
return CONNECT_TO_UPSTREAM;
}
int proxy_write_to_upstream( struct proxier* proxy, int state )
{
ssize_t count;
// assert( state == WRITE_TO_UPSTREAM );
count = iobuf_write( proxy->upstream_fd, &proxy->req );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to send request to upstream" ) );
proxy->req.needle = 0;
return CONNECT_TO_UPSTREAM;
}
if ( proxy->req.needle == proxy->req.size ) {
/* Request sent. Advance to reading the response from upstream. We might
* still need req.size if reading the reply fails - we disconnect
* and resend the reply in that case - so keep it around for now. */
proxy->req.needle = 0;
return READ_FROM_UPSTREAM;
}
return state;
}
int proxy_read_from_upstream( struct proxier* proxy, int state )
{
ssize_t count;
struct nbd_reply* reply = &(proxy->rsp_hdr);
struct nbd_reply_raw* reply_raw = (struct nbd_reply_raw*) proxy->rsp.buf;
/* We can't assume the NBD_REPLY_SIZE + req->len is what we'll get back */
count = iobuf_read( proxy->upstream_fd, &proxy->rsp, NBD_REPLY_SIZE );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to get reply from upstream" ) );
goto disconnect;
}
if ( proxy->rsp.needle == NBD_REPLY_SIZE ) {
nbd_r2h_reply( reply_raw, reply );
if ( reply->magic != REPLY_MAGIC ) {
warn( "Reply magic is incorrect" );
goto disconnect;
}
if ( reply->error != 0 ) {
warn( "NBD error returned from upstream: %"PRIu32, reply->error );
goto disconnect;
}
if ( ( proxy->req_hdr.type & REQUEST_MASK ) == REQUEST_READ ) {
/* Get the read reply data too. */
proxy->rsp.size += proxy->req_hdr.len;
}
}
if ( proxy->rsp.size == proxy->rsp.needle ) {
debug( "NBD reply received from upstream." );
proxy->rsp.needle = 0;
return WRITE_TO_DOWNSTREAM;
}
return state;
disconnect:
proxy->rsp.needle = 0;
proxy->rsp.size = 0;
return CONNECT_TO_UPSTREAM;
}
int proxy_write_to_downstream( struct proxier* proxy, int state )
{
ssize_t count;
// assert( state == WRITE_TO_DOWNSTREAM );
if ( !proxy->hello_sent ) {
info( "Writing init to downstream" );
}
count = iobuf_write( proxy->downstream_fd, &proxy->rsp );
if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to write to downstream" ) );
return EXIT;
}
if ( proxy->rsp.needle == proxy->rsp.size ) {
if ( !proxy->hello_sent ) {
info( "Hello message sent to client" );
proxy->hello_sent = 1;
} else {
debug( "Reply sent" );
proxy->req_count++;
}
/* We're done with the request & response buffers now */
proxy->req.size = 0;
proxy->req.needle = 0;
proxy->rsp.size = 0;
proxy->rsp.needle = 0;
return READ_FROM_DOWNSTREAM;
}
return state;
}
/* Non-blocking proxy session. Simple(ish) state machine. We read from d/s until
* we have a full request, then try to write that request u/s. If writing fails,
* we reconnect to upstream and retry. Once we've successfully written, we
* attempt to read the reply. If that fails or times out (we give it 30 seconds)
* then we disconnect from u/s and go back to trying to reconnect and resend.
*
* This is the second-simplest NBD proxy I can think of. The first version was
* non-blocking I/O, but it was getting impossible to manage exceptional stuff
*/
void proxy_session( struct proxier* proxy )
{
uint64_t state_started = monotonic_time_ms();
int old_state = EXIT;
int state;
int connect_to_upstream_cooldown = 0;
/* First action: Write hello to downstream */
nbd_hello_to_buf( (struct nbd_init_raw *) proxy->rsp.buf, proxy->upstream_size );
proxy->rsp.size = sizeof( struct nbd_init_raw );
proxy->rsp.needle = 0;
state = WRITE_TO_DOWNSTREAM;
info( "Beginning proxy session on fd %i", proxy->downstream_fd );
while( state != EXIT ) {
struct timeval select_timeout = {
.tv_sec = 0,
.tv_usec = 0
};
struct timeval *select_timeout_ptr = NULL;
int result; /* used by select() */
fd_set rfds;
fd_set wfds;
FD_ZERO( &rfds );
FD_ZERO( &wfds );
if ( state != old_state ) {
state_started = monotonic_time_ms();
debug(
"State transitition from %s to %s",
proxy_session_state_names[old_state],
proxy_session_state_names[state]
);
} else {
debug( "Proxy is in state %s", proxy_session_state_names[state], state );
}
old_state = state;
switch( state ) {
case READ_FROM_DOWNSTREAM:
FD_SET( proxy->downstream_fd, &rfds );
break;
case WRITE_TO_DOWNSTREAM:
FD_SET( proxy->downstream_fd, &wfds );
break;
case WRITE_TO_UPSTREAM:
select_timeout.tv_sec = 15;
FD_SET(proxy->upstream_fd, &wfds );
break;
case CONNECT_TO_UPSTREAM:
/* upstream_fd is now -1 */
proxy_disconnect_from_upstream( proxy );
if ( connect_to_upstream_cooldown ) {
connect_to_upstream_cooldown = 0;
select_timeout.tv_sec = 3;
} else {
proxy_start_connect_to_upstream( proxy );
if ( proxy->upstream_fd == -1 ) {
warn( SHOW_ERRNO( "Error acquiring socket to upstream" ) );
continue;
}
FD_SET( proxy->upstream_fd, &wfds );
select_timeout.tv_sec = 15;
}
break;
case READ_INIT_FROM_UPSTREAM:
case READ_FROM_UPSTREAM:
select_timeout.tv_sec = 15;
FD_SET( proxy->upstream_fd, &rfds );
break;
};
if ( select_timeout.tv_sec > 0 ) {
select_timeout_ptr = &select_timeout;
}
result = sock_try_select( FD_SETSIZE, &rfds, &wfds, NULL, select_timeout_ptr );
if ( result == -1 ) {
warn( SHOW_ERRNO( "select() failed: " ) );
break;
}
/* Happens after failed reconnect. Avoid SIGBUS on FD_ISSET() */
if ( proxy->upstream_fd == -1 ) {
continue;
}
switch( state ) {
case READ_FROM_DOWNSTREAM:
if ( FD_ISSET( proxy->downstream_fd, &rfds ) ) {
state = proxy_read_from_downstream( proxy, state );
#ifdef PREFETCH
/* Check if we can fulfil the request from prefetch, or
* rewrite the request to fill the prefetch buffer if needed
*/
if ( state == WRITE_TO_UPSTREAM ) {
state = proxy_prefetch_for_request( proxy, state );
}
#endif
}
break;
case CONNECT_TO_UPSTREAM:
if ( FD_ISSET( proxy->upstream_fd, &wfds ) ) {
state = proxy_continue_connecting_to_upstream( proxy, state );
}
/* Leaving state untouched will retry connecting to upstream -
* so introduce a bit of sleep */
if ( state == CONNECT_TO_UPSTREAM ) {
connect_to_upstream_cooldown = 1;
}
break;
case READ_INIT_FROM_UPSTREAM:
state = proxy_read_init_from_upstream( proxy, state );
if ( state == CONNECT_TO_UPSTREAM ) {
connect_to_upstream_cooldown = 1;
}
break;
case WRITE_TO_UPSTREAM:
if ( FD_ISSET( proxy->upstream_fd, &wfds ) ) {
state = proxy_write_to_upstream( proxy, state );
}
break;
case READ_FROM_UPSTREAM:
if ( FD_ISSET( proxy->upstream_fd, &rfds ) ) {
state = proxy_read_from_upstream( proxy, state );
}
# ifdef PREFETCH
/* Fill the prefetch buffer and rewrite the reply, if needed */
if ( state == WRITE_TO_DOWNSTREAM ) {
state = proxy_prefetch_for_reply( proxy, state );
}
#endif
break;
case WRITE_TO_DOWNSTREAM:
if ( FD_ISSET( proxy->downstream_fd, &wfds ) ) {
state = proxy_write_to_downstream( proxy, state );
}
break;
}
/* In these states, we're interested in restarting after a timeout.
*/
if ( old_state == state && proxy_state_upstream( state ) ) {
if ( ( monotonic_time_ms() ) - state_started > UPSTREAM_TIMEOUT ) {
warn(
"Timed out in state %s while communicating with upstream",
proxy_session_state_names[state]
);
state = CONNECT_TO_UPSTREAM;
}
}
}
info(
"Finished proxy session on fd %i after %"PRIu64" successful request(s)",
proxy->downstream_fd, proxy->req_count
);
/* Reset these two for the next session */
proxy->req_count = 0;
proxy->hello_sent = 0;
return;
}
/** Accept an NBD socket connection, dispatch appropriately */
int proxy_accept( struct proxier* params )
{
NULLCHECK( params );
int client_fd;
fd_set fds;
union mysockaddr client_address;
socklen_t socklen = sizeof( client_address );
info( "Waiting for client connection" );
FD_ZERO(&fds);
FD_SET(params->listen_fd, &fds);
FATAL_IF_NEGATIVE(
sock_try_select(FD_SETSIZE, &fds, NULL, NULL, NULL),
SHOW_ERRNO( "select() failed" )
);
if ( FD_ISSET( params->listen_fd, &fds ) ) {
client_fd = accept( params->listen_fd, &client_address.generic, &socklen );
if ( client_address.family != AF_UNIX ) {
if ( sock_set_tcp_nodelay(client_fd, 1) == -1 ) {
warn( SHOW_ERRNO( "Failed to set TCP_NODELAY" ) );
}
}
info( "Accepted nbd client socket fd %d", client_fd );
sock_set_nonblock( client_fd, 1 );
params->downstream_fd = client_fd;
proxy_session( params );
WARN_IF_NEGATIVE(
sock_try_close( params->downstream_fd ),
"Couldn't close() downstram fd %i after proxy session",
params->downstream_fd
);
params->downstream_fd = -1;
}
return 1; /* We actually expect to be interrupted by signal handlers */
}
void proxy_accept_loop( struct proxier* params )
{
NULLCHECK( params );
while( proxy_accept( params ) );
}
/** Closes sockets */
void proxy_cleanup( struct proxier* proxy )
{
NULLCHECK( proxy );
info( "Cleaning up" );
if ( -1 != proxy->listen_fd ) {
if ( AF_UNIX == proxy->listen_on.family ) {
if ( -1 == unlink( proxy->listen_on.un.sun_path ) ) {
warn( SHOW_ERRNO( "Failed to unlink %s", proxy->listen_on.un.sun_path ) );
}
}
WARN_IF_NEGATIVE(
sock_try_close( proxy->listen_fd ),
SHOW_ERRNO( "Failed to close() listen fd %i", proxy->listen_fd )
);
proxy->listen_fd = -1;
}
if ( -1 != proxy->downstream_fd ) {
WARN_IF_NEGATIVE(
sock_try_close( proxy->downstream_fd ),
SHOW_ERRNO(
"Failed to close() downstream fd %i", proxy->downstream_fd
)
);
proxy->downstream_fd = -1;
}
if ( -1 != proxy->upstream_fd ) {
WARN_IF_NEGATIVE(
sock_try_close( proxy->upstream_fd ),
SHOW_ERRNO(
"Failed to close() upstream fd %i", proxy->upstream_fd
)
);
proxy->upstream_fd = -1;
}
info( "Cleanup done" );
}
/** Full lifecycle of the proxier */
int do_proxy( struct proxier* params )
{
NULLCHECK( params );
info( "Ensuring upstream server is open" );
if ( !proxy_connect_to_upstream( params ) ) {
warn( "Couldn't connect to upstream server during initialization, exiting" );
proxy_cleanup( params );
return 1;
};
proxy_open_listen_socket( params );
proxy_accept_loop( params );
proxy_cleanup( params );
return 0;
}

View File

@@ -1,98 +0,0 @@
#ifndef PROXY_H
#define PROXY_H
#include <sys/types.h>
#include <unistd.h>
#include "ioutil.h"
#include "flexnbd.h"
#include "parse.h"
#include "nbdtypes.h"
#include "self_pipe.h"
#ifdef PREFETCH
#include "prefetch.h"
#endif
/** UPSTREAM_TIMEOUT
* How long ( in ms ) to allow for upstream to respond. If it takes longer
* than this, we will cancel the current request-response to them and resubmit
*/
#define UPSTREAM_TIMEOUT 30 * 1000
struct proxier {
/* The flexnbd wrapper this proxier is attached to */
struct flexnbd* flexnbd;
/** address/port to bind to */
union mysockaddr listen_on;
/** address/port to connect to */
union mysockaddr connect_to;
/** address to bind to when making outgoing connections */
union mysockaddr connect_from;
int bind; /* Set to true if we should use it */
/* The socket we listen() on and accept() against */
int listen_fd;
/* The socket returned by accept() that we receive requests from and send
* responses to
*/
int downstream_fd;
/* The socket returned by connect() that we send requests to and receive
* responses from
*/
int upstream_fd;
/* This is the size we advertise to the downstream server */
off64_t upstream_size;
/* We transform the raw request header into here */
struct nbd_request req_hdr;
/* We transform the raw reply header into here */
struct nbd_reply rsp_hdr;
/* Used for our non-blocking negotiation with upstream. TODO: maybe use
* for downstream as well ( we currently overload rsp ) */
struct iobuf init;
/* The current NBD request from downstream */
struct iobuf req;
/* The current NBD reply from upstream */
struct iobuf rsp;
/* It's starting to feel like we need an object for a single proxy session.
* These two track how many requests we've sent so far, and whether the
* NBD_INIT code has been sent to the client yet.
*/
uint64_t req_count;
int hello_sent;
#ifdef PREFETCH
/* While the in-flight request has been munged by prefetch, these two are
* set to true, and the original length of the request, respectively */
int is_prefetch_req;
uint32_t prefetch_req_orig_len;
/* And here, we actually store the prefetched data once it's returned */
struct prefetch *prefetch;
#endif
};
struct proxier* proxy_create(
char* s_downstream_address,
char* s_downstream_port,
char* s_upstream_address,
char* s_upstream_port,
char* s_upstream_bind );
int do_proxy( struct proxier* proxy );
void proxy_cleanup( struct proxier* proxy );
void proxy_destroy( struct proxier* proxy );
#endif

78
src/proxy/prefetch.c Normal file
View File

@@ -0,0 +1,78 @@
#include "prefetch.h"
#include "util.h"
struct prefetch *prefetch_create(size_t size_bytes)
{
struct prefetch *out = xmalloc(sizeof(struct prefetch));
NULLCHECK(out);
out->buffer = xmalloc(size_bytes);
NULLCHECK(out->buffer);
out->size = size_bytes;
out->is_full = 0;
out->from = 0;
out->len = 0;
return out;
}
void prefetch_destroy(struct prefetch *prefetch)
{
if (prefetch) {
free(prefetch->buffer);
free(prefetch);
}
}
size_t prefetch_size(struct prefetch *prefetch)
{
if (prefetch) {
return prefetch->size;
} else {
return 0;
}
}
void prefetch_set_is_empty(struct prefetch *prefetch)
{
prefetch_set_full(prefetch, 0);
}
void prefetch_set_is_full(struct prefetch *prefetch)
{
prefetch_set_full(prefetch, 1);
}
void prefetch_set_full(struct prefetch *prefetch, int val)
{
if (prefetch) {
prefetch->is_full = val;
}
}
int prefetch_is_full(struct prefetch *prefetch)
{
if (prefetch) {
return prefetch->is_full;
} else {
return 0;
}
}
int prefetch_contains(struct prefetch *prefetch, uint64_t from,
uint32_t len)
{
NULLCHECK(prefetch);
return from >= prefetch->from &&
from + len <= prefetch->from + prefetch->len;
}
char *prefetch_offset(struct prefetch *prefetch, uint64_t from)
{
NULLCHECK(prefetch);
return prefetch->buffer + (from - prefetch->from);
}

34
src/proxy/prefetch.h Normal file
View File

@@ -0,0 +1,34 @@
#ifndef PREFETCH_H
#define PREFETCH_H
#include <stdint.h>
#include <stddef.h>
#define PREFETCH_BUFSIZE 4096
struct prefetch {
/* True if there is data in the buffer. */
int is_full;
/* The start point of the current content of buffer */
uint64_t from;
/* The length of the current content of buffer */
uint32_t len;
/* The total size of the buffer, in bytes. */
size_t size;
char *buffer;
};
struct prefetch *prefetch_create(size_t size_bytes);
void prefetch_destroy(struct prefetch *prefetch);
size_t prefetch_size(struct prefetch *);
void prefetch_set_is_empty(struct prefetch *prefetch);
void prefetch_set_is_full(struct prefetch *prefetch);
void prefetch_set_full(struct prefetch *prefetch, int val);
int prefetch_is_full(struct prefetch *prefetch);
int prefetch_contains(struct prefetch *prefetch, uint64_t from,
uint32_t len);
char *prefetch_offset(struct prefetch *prefetch, uint64_t from);
#endif

988
src/proxy/proxy.c Normal file
View File

@@ -0,0 +1,988 @@
#include "proxy.h"
#include "readwrite.h"
#include "prefetch.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include <errno.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
struct proxier *proxy_create(char *s_downstream_address,
char *s_downstream_port,
char *s_upstream_address,
char *s_upstream_port,
char *s_upstream_bind, char *s_cache_bytes)
{
struct proxier *out;
out = xmalloc(sizeof(struct proxier));
FATAL_IF_NULL(s_downstream_address, "Listen address not specified");
NULLCHECK(s_downstream_address);
FATAL_UNLESS(parse_to_sockaddr
(&out->listen_on.generic, s_downstream_address),
"Couldn't parse downstream address %s");
if (out->listen_on.family != AF_UNIX) {
FATAL_IF_NULL(s_downstream_port, "Downstream port not specified");
NULLCHECK(s_downstream_port);
parse_port(s_downstream_port, &out->listen_on.v4);
}
FATAL_IF_NULL(s_upstream_address, "Upstream address not specified");
NULLCHECK(s_upstream_address);
FATAL_UNLESS(parse_ip_to_sockaddr
(&out->connect_to.generic, s_upstream_address),
"Couldn't parse upstream address '%s'",
s_upstream_address);
FATAL_IF_NULL(s_upstream_port, "Upstream port not specified");
NULLCHECK(s_upstream_port);
parse_port(s_upstream_port, &out->connect_to.v4);
if (s_upstream_bind) {
FATAL_IF_ZERO(parse_ip_to_sockaddr
(&out->connect_from.generic, s_upstream_bind),
"Couldn't parse bind address '%s'", s_upstream_bind);
out->bind = 1;
}
out->listen_fd = -1;
out->downstream_fd = -1;
out->upstream_fd = -1;
out->prefetch = NULL;
if (s_cache_bytes) {
int cache_bytes = atoi(s_cache_bytes);
/* leaving this off or setting a cache size of zero or
* less results in no cache.
*/
if (cache_bytes >= 0) {
out->prefetch = prefetch_create(cache_bytes);
}
}
out->init.buf = xmalloc(sizeof(struct nbd_init_raw));
/* Add on the request / reply size to our malloc to accommodate both
* the struct and the data
*/
out->req.buf = xmalloc(NBD_MAX_SIZE + NBD_REQUEST_SIZE);
out->rsp.buf = xmalloc(NBD_MAX_SIZE + NBD_REPLY_SIZE);
log_context =
xmalloc(strlen(s_upstream_address) + strlen(s_upstream_port) + 2);
sprintf(log_context, "%s:%s", s_upstream_address, s_upstream_port);
return out;
}
int proxy_prefetches(struct proxier *proxy)
{
NULLCHECK(proxy);
return proxy->prefetch != NULL;
}
int proxy_prefetch_bufsize(struct proxier *proxy)
{
NULLCHECK(proxy);
return prefetch_size(proxy->prefetch);
}
void proxy_destroy(struct proxier *proxy)
{
free(proxy->init.buf);
free(proxy->req.buf);
free(proxy->rsp.buf);
prefetch_destroy(proxy->prefetch);
free(proxy);
}
/* Shared between our two different connect_to_upstream paths */
void proxy_finish_connect_to_upstream(struct proxier *proxy, uint64_t size,
uint32_t flags);
/* Try to establish a connection to our upstream server. Return 1 on success,
* 0 on failure. this is a blocking call that returns a non-blocking socket.
*/
int proxy_connect_to_upstream(struct proxier *proxy)
{
struct sockaddr *connect_from = NULL;
if (proxy->bind) {
connect_from = &proxy->connect_from.generic;
}
int fd = socket_connect(&proxy->connect_to.generic, connect_from);
uint64_t size = 0;
uint32_t flags = 0;
if (-1 == fd) {
return 0;
}
if (!socket_nbd_read_hello(fd, &size, &flags)) {
WARN_IF_NEGATIVE(sock_try_close(fd),
"Couldn't close() after failed read of NBD hello on fd %i",
fd);
return 0;
}
proxy->upstream_fd = fd;
sock_set_nonblock(fd, 1);
proxy_finish_connect_to_upstream(proxy, size, flags);
return 1;
}
/* First half of non-blocking connection to upstream. Gets as far as calling
* connect() on a non-blocking socket.
*/
void proxy_start_connect_to_upstream(struct proxier *proxy)
{
int fd, result;
struct sockaddr *from = NULL;
struct sockaddr *to = &proxy->connect_to.generic;
if (proxy->bind) {
from = &proxy->connect_from.generic;
}
fd = socket(to->sa_family, SOCK_STREAM, 0);
if (fd < 0) {
warn(SHOW_ERRNO
("Couldn't create socket to reconnect to upstream"));
return;
}
info("Beginning non-blocking connection to upstream on fd %i", fd);
if (NULL != from) {
if (0 > bind(fd, from, sockaddr_size(from))) {
warn(SHOW_ERRNO("bind() to source address failed"));
}
}
result = sock_set_nonblock(fd, 1);
if (result == -1) {
warn(SHOW_ERRNO("Failed to set upstream fd %i non-blocking", fd));
goto error;
}
result = connect(fd, to, sockaddr_size(to));
if (result == -1 && errno != EINPROGRESS) {
warn(SHOW_ERRNO("Failed to start connect()ing to upstream!"));
goto error;
}
proxy->upstream_fd = fd;
return;
error:
if (sock_try_close(fd) == -1) {
/* Non-fatal leak, although still nasty */
warn(SHOW_ERRNO("Failed to close fd for upstream %i", fd));
}
return;
}
void proxy_finish_connect_to_upstream(struct proxier *proxy, uint64_t size,
uint32_t flags)
{
if (proxy->upstream_size == 0) {
info("Size of upstream image is %" PRIu64 " bytes", size);
} else if (proxy->upstream_size != size) {
warn("Size changed from %" PRIu64 " to %" PRIu64 " bytes",
proxy->upstream_size, size);
}
proxy->upstream_size = size;
if (proxy->upstream_flags == 0) {
info("Upstream transmission flags set to %" PRIu32 "", flags);
} else if (proxy->upstream_flags != flags) {
warn("Upstream transmission flags changed from %" PRIu32 " to %"
PRIu32 "", proxy->upstream_flags, flags);
}
proxy->upstream_flags = flags;
if (AF_UNIX != proxy->connect_to.family) {
if (sock_set_tcp_nodelay(proxy->upstream_fd, 1) == -1) {
warn(SHOW_ERRNO("Failed to set TCP_NODELAY"));
}
}
info("Connected to upstream on fd %i", proxy->upstream_fd);
return;
}
void proxy_disconnect_from_upstream(struct proxier *proxy)
{
if (-1 != proxy->upstream_fd) {
info("Closing upstream connection on fd %i", proxy->upstream_fd);
/* TODO: An NBD disconnect would be pleasant here */
WARN_IF_NEGATIVE(sock_try_close(proxy->upstream_fd),
"Failed to close() fd %i when disconnecting from upstream",
proxy->upstream_fd);
proxy->upstream_fd = -1;
}
}
/** Prepares a listening socket for the NBD server, binding etc. */
void proxy_open_listen_socket(struct proxier *params)
{
NULLCHECK(params);
params->listen_fd = socket(params->listen_on.family, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(params->listen_fd,
SHOW_ERRNO("Couldn't create listen socket")
);
/* Allow us to restart quickly */
FATAL_IF_NEGATIVE(sock_set_reuseaddr(params->listen_fd, 1),
SHOW_ERRNO("Couldn't set SO_REUSEADDR")
);
if (AF_UNIX != params->listen_on.family) {
FATAL_IF_NEGATIVE(sock_set_tcp_nodelay(params->listen_fd, 1),
SHOW_ERRNO("Couldn't set TCP_NODELAY")
);
}
FATAL_UNLESS_ZERO(sock_try_bind
(params->listen_fd, &params->listen_on.generic),
SHOW_ERRNO("Failed to bind to listening socket")
);
/* We're only serving one client at a time, hence backlog of 1 */
FATAL_IF_NEGATIVE(listen(params->listen_fd, 1),
SHOW_ERRNO("Failed to listen on listening socket")
);
info("Now listening for incoming connections");
return;
}
typedef enum {
EXIT,
WRITE_TO_DOWNSTREAM,
READ_FROM_DOWNSTREAM,
CONNECT_TO_UPSTREAM,
READ_INIT_FROM_UPSTREAM,
WRITE_TO_UPSTREAM,
READ_FROM_UPSTREAM
} proxy_session_states;
static char *proxy_session_state_names[] = {
"EXIT",
"WRITE_TO_DOWNSTREAM",
"READ_FROM_DOWNSTREAM",
"CONNECT_TO_UPSTREAM",
"READ_INIT_FROM_UPSTREAM",
"WRITE_TO_UPSTREAM",
"READ_FROM_UPSTREAM"
};
static inline int proxy_state_upstream(int state)
{
return state == CONNECT_TO_UPSTREAM || state == READ_INIT_FROM_UPSTREAM
|| state == WRITE_TO_UPSTREAM || state == READ_FROM_UPSTREAM;
}
int proxy_prefetch_for_request(struct proxier *proxy, int state)
{
NULLCHECK(proxy);
struct nbd_request *req = &proxy->req_hdr;
struct nbd_reply *rsp = &proxy->rsp_hdr;
struct nbd_request_raw *req_raw =
(struct nbd_request_raw *) proxy->req.buf;
struct nbd_reply_raw *rsp_raw =
(struct nbd_reply_raw *) proxy->rsp.buf;
int is_read = req->type == REQUEST_READ;
if (is_read) {
/* See if we can respond with what's in our prefetch
* cache */
if (prefetch_is_full(proxy->prefetch) &&
prefetch_contains(proxy->prefetch, req->from, req->len)) {
/* HUZZAH! A match! */
debug("Prefetch hit!");
/* First build a reply header */
rsp->magic = REPLY_MAGIC;
rsp->error = 0;
memcpy(&rsp->handle, &req->handle, 8);
/* now copy it into the response */
nbd_h2r_reply(rsp, rsp_raw);
/* and the data */
memcpy(proxy->rsp.buf + NBD_REPLY_SIZE,
prefetch_offset(proxy->prefetch, req->from), req->len);
proxy->rsp.size = NBD_REPLY_SIZE + req->len;
proxy->rsp.needle = 0;
/* return early, our work here is done */
return WRITE_TO_DOWNSTREAM;
}
} else {
/* Safety catch. If we're sending a write request, we
* blow away the cache. This is very pessimistic, but
* it's simpler (and therefore safer) than working out
* whether we can keep it or not.
*/
debug("Blowing away prefetch cache on type %d request.",
req->type);
prefetch_set_is_empty(proxy->prefetch);
}
debug("Prefetch cache MISS!");
uint64_t prefetch_start = req->from;
/* We prefetch what we expect to be the next request. */
uint64_t prefetch_end = req->from + (req->len * 2);
/* We only want to consider prefetching if we know we're not
* getting too much data back, if it's a read request, and if
* the prefetch won't try to read past the end of the file.
*/
int prefetching =
req->len <= prefetch_size(proxy->prefetch) &&
is_read &&
prefetch_start < prefetch_end &&
prefetch_end <= proxy->upstream_size;
/* We pull the request out of the proxy struct, rewrite the
* request size, and write it back.
*/
if (prefetching) {
proxy->is_prefetch_req = 1;
proxy->prefetch_req_orig_len = req->len;
req->len *= 2;
debug("Prefetching additional %" PRIu32 " bytes",
req->len - proxy->prefetch_req_orig_len);
nbd_h2r_request(req, req_raw);
}
return state;
}
int proxy_prefetch_for_reply(struct proxier *proxy, int state)
{
size_t prefetched_bytes;
if (!proxy->is_prefetch_req) {
return state;
}
prefetched_bytes = proxy->req_hdr.len - proxy->prefetch_req_orig_len;
debug("Prefetched additional %d bytes", prefetched_bytes);
memcpy(proxy->prefetch->buffer,
proxy->rsp.buf + proxy->prefetch_req_orig_len + NBD_REPLY_SIZE,
prefetched_bytes);
proxy->prefetch->from =
proxy->req_hdr.from + proxy->prefetch_req_orig_len;
proxy->prefetch->len = prefetched_bytes;
/* We've finished with proxy->req by now, so don't need to alter it to make
* it look like the request was before prefetch */
/* Truncate the bytes we'll write downstream */
proxy->req_hdr.len = proxy->prefetch_req_orig_len;
proxy->rsp.size -= prefetched_bytes;
/* And we need to reset these */
prefetch_set_is_full(proxy->prefetch);
proxy->is_prefetch_req = 0;
return state;
}
int proxy_read_from_downstream(struct proxier *proxy, int state)
{
ssize_t count;
struct nbd_request_raw *request_raw =
(struct nbd_request_raw *) proxy->req.buf;
struct nbd_request *request = &(proxy->req_hdr);
// assert( state == READ_FROM_DOWNSTREAM );
count =
iobuf_read(proxy->downstream_fd, &proxy->req, NBD_REQUEST_SIZE);
if (count == -1) {
warn(SHOW_ERRNO("Couldn't read request from downstream"));
return EXIT;
}
if (proxy->req.needle == NBD_REQUEST_SIZE) {
nbd_r2h_request(request_raw, request);
if (request->type == REQUEST_DISCONNECT) {
info("Received disconnect request from client");
return EXIT;
}
/* Simple validations -- the request / reply size have already
* been taken into account in the xmalloc, so no need to worry
* about them here
*/
if (request->type == REQUEST_READ) {
if (request->len > NBD_MAX_SIZE) {
warn("NBD read request size %" PRIu32 " too large",
request->len);
return EXIT;
}
}
if (request->type == REQUEST_WRITE) {
if (request->len > NBD_MAX_SIZE) {
warn("NBD write request size %" PRIu32 " too large",
request->len);
return EXIT;
}
proxy->req.size += request->len;
}
}
if (proxy->req.needle == proxy->req.size) {
debug("Received NBD request from downstream. type=%" PRIu16
" flags=%" PRIu16 " from=%" PRIu64 " len=%" PRIu32,
request->type, request->flags, request->from, request->len);
/* Finished reading, so advance state. Leave size untouched so the next
* state knows how many bytes to write */
proxy->req.needle = 0;
return WRITE_TO_UPSTREAM;
}
return state;
}
int proxy_continue_connecting_to_upstream(struct proxier *proxy, int state)
{
int error, result;
socklen_t len = sizeof(error);
// assert( state == CONNECT_TO_UPSTREAM );
result =
getsockopt(proxy->upstream_fd, SOL_SOCKET, SO_ERROR, &error, &len);
if (result == -1) {
warn(SHOW_ERRNO("Failed to tell if connected to upstream"));
return state;
}
if (error != 0) {
errno = error;
warn(SHOW_ERRNO("Failed to connect to upstream"));
return state;
}
/* Data may have changed while we were disconnected */
prefetch_set_is_empty(proxy->prefetch);
info("Connected to upstream on fd %i", proxy->upstream_fd);
return READ_INIT_FROM_UPSTREAM;
}
int proxy_read_init_from_upstream(struct proxier *proxy, int state)
{
ssize_t count;
// assert( state == READ_INIT_FROM_UPSTREAM );
count =
iobuf_read(proxy->upstream_fd, &proxy->init,
sizeof(struct nbd_init_raw));
if (count == -1) {
warn(SHOW_ERRNO("Failed to read init from upstream"));
goto disconnect;
}
if (proxy->init.needle == proxy->init.size) {
uint64_t upstream_size;
uint32_t upstream_flags;
if (!nbd_check_hello
((struct nbd_init_raw *) proxy->init.buf, &upstream_size,
&upstream_flags)) {
warn("Upstream sent invalid init");
goto disconnect;
}
/* record the flags, and log the reconnection, set TCP_NODELAY */
proxy_finish_connect_to_upstream(proxy, upstream_size,
upstream_flags);
/* Currently, we only get disconnected from upstream (so needing to come
* here) when we have an outstanding request. If that becomes false,
* we'll need to choose the right state to return to here */
proxy->init.needle = 0;
return WRITE_TO_UPSTREAM;
}
return state;
disconnect:
proxy->init.needle = 0;
proxy->init.size = 0;
return CONNECT_TO_UPSTREAM;
}
int proxy_write_to_upstream(struct proxier *proxy, int state)
{
ssize_t count;
// assert( state == WRITE_TO_UPSTREAM );
/* FIXME: We may set cork=1 multiple times as a result of this idiom.
* Not a serious problem, but we could do better
*/
if (proxy->req.needle == 0 && AF_UNIX != proxy->connect_to.family) {
if (sock_set_tcp_cork(proxy->upstream_fd, 1) == -1) {
warn(SHOW_ERRNO("Failed to set TCP_CORK"));
}
}
count = iobuf_write(proxy->upstream_fd, &proxy->req);
if (count == -1) {
warn(SHOW_ERRNO("Failed to send request to upstream"));
proxy->req.needle = 0;
// We're throwing the socket away so no need to uncork
return CONNECT_TO_UPSTREAM;
}
if (proxy->req.needle == proxy->req.size) {
/* Request sent. Advance to reading the response from upstream. We might
* still need req.size if reading the reply fails - we disconnect
* and resend the reply in that case - so keep it around for now. */
proxy->req.needle = 0;
if (AF_UNIX != proxy->connect_to.family) {
if (sock_set_tcp_cork(proxy->upstream_fd, 0) == -1) {
warn(SHOW_ERRNO("Failed to unset TCP_CORK"));
// TODO: should we return to CONNECT_TO_UPSTREAM in this instance?
}
}
return READ_FROM_UPSTREAM;
}
return state;
}
int proxy_read_from_upstream(struct proxier *proxy, int state)
{
ssize_t count;
struct nbd_reply *reply = &(proxy->rsp_hdr);
struct nbd_reply_raw *reply_raw =
(struct nbd_reply_raw *) proxy->rsp.buf;
/* We can't assume the NBD_REPLY_SIZE + req->len is what we'll get back */
count = iobuf_read(proxy->upstream_fd, &proxy->rsp, NBD_REPLY_SIZE);
if (count == -1) {
warn(SHOW_ERRNO("Failed to get reply from upstream"));
goto disconnect;
}
if (proxy->rsp.needle == NBD_REPLY_SIZE) {
nbd_r2h_reply(reply_raw, reply);
if (reply->magic != REPLY_MAGIC) {
warn("Reply magic is incorrect");
goto disconnect;
}
if (proxy->req_hdr.type == REQUEST_READ) {
/* Get the read reply data too. */
proxy->rsp.size += proxy->req_hdr.len;
}
}
if (proxy->rsp.size == proxy->rsp.needle) {
debug("NBD reply received from upstream.");
proxy->rsp.needle = 0;
return WRITE_TO_DOWNSTREAM;
}
return state;
disconnect:
proxy->rsp.needle = 0;
proxy->rsp.size = 0;
return CONNECT_TO_UPSTREAM;
}
int proxy_write_to_downstream(struct proxier *proxy, int state)
{
ssize_t count;
// assert( state == WRITE_TO_DOWNSTREAM );
if (!proxy->hello_sent) {
info("Writing init to downstream");
}
count = iobuf_write(proxy->downstream_fd, &proxy->rsp);
if (count == -1) {
warn(SHOW_ERRNO("Failed to write to downstream"));
return EXIT;
}
if (proxy->rsp.needle == proxy->rsp.size) {
if (!proxy->hello_sent) {
info("Hello message sent to client");
proxy->hello_sent = 1;
} else {
debug("Reply sent");
proxy->req_count++;
}
/* We're done with the request & response buffers now */
proxy->req.size = 0;
proxy->req.needle = 0;
proxy->rsp.size = 0;
proxy->rsp.needle = 0;
return READ_FROM_DOWNSTREAM;
}
return state;
}
/* Non-blocking proxy session. Simple(ish) state machine. We read from d/s until
* we have a full request, then try to write that request u/s. If writing fails,
* we reconnect to upstream and retry. Once we've successfully written, we
* attempt to read the reply. If that fails or times out (we give it 30 seconds)
* then we disconnect from u/s and go back to trying to reconnect and resend.
*
* This is the second-simplest NBD proxy I can think of. The first version was
* non-blocking I/O, but it was getting impossible to manage exceptional stuff
*/
void proxy_session(struct proxier *proxy)
{
uint64_t state_started = monotonic_time_ms();
int old_state = EXIT;
int state;
int connect_to_upstream_cooldown = 0;
/* First action: Write hello to downstream */
nbd_hello_to_buf((struct nbd_init_raw *) proxy->rsp.buf,
proxy->upstream_size, proxy->upstream_flags);
proxy->rsp.size = sizeof(struct nbd_init_raw);
proxy->rsp.needle = 0;
state = WRITE_TO_DOWNSTREAM;
info("Beginning proxy session on fd %i", proxy->downstream_fd);
while (state != EXIT) {
struct timeval select_timeout = {
.tv_sec = 0,
.tv_usec = 0
};
struct timeval *select_timeout_ptr = NULL;
int result; /* used by select() */
fd_set rfds;
fd_set wfds;
FD_ZERO(&rfds);
FD_ZERO(&wfds);
if (state != old_state) {
state_started = monotonic_time_ms();
debug("State transition from %s to %s",
proxy_session_state_names[old_state],
proxy_session_state_names[state]
);
} else {
debug("Proxy is in state %s", proxy_session_state_names[state],
state);
}
old_state = state;
switch (state) {
case READ_FROM_DOWNSTREAM:
FD_SET(proxy->downstream_fd, &rfds);
break;
case WRITE_TO_DOWNSTREAM:
FD_SET(proxy->downstream_fd, &wfds);
break;
case WRITE_TO_UPSTREAM:
select_timeout.tv_sec = 15;
FD_SET(proxy->upstream_fd, &wfds);
break;
case CONNECT_TO_UPSTREAM:
/* upstream_fd is now -1 */
proxy_disconnect_from_upstream(proxy);
if (connect_to_upstream_cooldown) {
connect_to_upstream_cooldown = 0;
select_timeout.tv_sec = 3;
} else {
proxy_start_connect_to_upstream(proxy);
if (proxy->upstream_fd == -1) {
warn(SHOW_ERRNO("Error acquiring socket to upstream"));
continue;
}
FD_SET(proxy->upstream_fd, &wfds);
select_timeout.tv_sec = 15;
}
break;
case READ_INIT_FROM_UPSTREAM:
case READ_FROM_UPSTREAM:
select_timeout.tv_sec = 15;
FD_SET(proxy->upstream_fd, &rfds);
break;
};
if (select_timeout.tv_sec > 0) {
select_timeout_ptr = &select_timeout;
}
result =
sock_try_select(FD_SETSIZE, &rfds, &wfds, NULL,
select_timeout_ptr);
if (result == -1) {
warn(SHOW_ERRNO("select() failed: "));
break;
}
/* Happens after failed reconnect. Avoid SIGBUS on FD_ISSET() */
if (proxy->upstream_fd == -1) {
continue;
}
switch (state) {
case READ_FROM_DOWNSTREAM:
if (FD_ISSET(proxy->downstream_fd, &rfds)) {
state = proxy_read_from_downstream(proxy, state);
/* Check if we can fulfil the request from prefetch, or
* rewrite the request to fill the prefetch buffer if needed
*/
if (proxy_prefetches(proxy) && state == WRITE_TO_UPSTREAM) {
state = proxy_prefetch_for_request(proxy, state);
}
}
break;
case CONNECT_TO_UPSTREAM:
if (FD_ISSET(proxy->upstream_fd, &wfds)) {
state =
proxy_continue_connecting_to_upstream(proxy, state);
}
/* Leaving state untouched will retry connecting to upstream -
* so introduce a bit of sleep */
if (state == CONNECT_TO_UPSTREAM) {
connect_to_upstream_cooldown = 1;
}
break;
case READ_INIT_FROM_UPSTREAM:
state = proxy_read_init_from_upstream(proxy, state);
if (state == CONNECT_TO_UPSTREAM) {
connect_to_upstream_cooldown = 1;
}
break;
case WRITE_TO_UPSTREAM:
if (FD_ISSET(proxy->upstream_fd, &wfds)) {
state = proxy_write_to_upstream(proxy, state);
}
break;
case READ_FROM_UPSTREAM:
if (FD_ISSET(proxy->upstream_fd, &rfds)) {
state = proxy_read_from_upstream(proxy, state);
}
/* Fill the prefetch buffer and rewrite the reply, if needed */
if (proxy_prefetches(proxy) && state == WRITE_TO_DOWNSTREAM) {
state = proxy_prefetch_for_reply(proxy, state);
}
break;
case WRITE_TO_DOWNSTREAM:
if (FD_ISSET(proxy->downstream_fd, &wfds)) {
state = proxy_write_to_downstream(proxy, state);
}
break;
}
/* In these states, we're interested in restarting after a timeout.
*/
if (old_state == state && proxy_state_upstream(state)) {
if ((monotonic_time_ms()) - state_started > UPSTREAM_TIMEOUT) {
warn("Timed out in state %s while communicating with upstream", proxy_session_state_names[state]
);
state = CONNECT_TO_UPSTREAM;
/* Since we've timed out, we won't have gone through the timeout logic
* in the various state handlers that resets these appropriately... */
proxy->init.size = 0;
proxy->init.needle = 0;
proxy->rsp.size = 0;
proxy->rsp.needle = 0;
}
}
}
info("Finished proxy session on fd %i after %" PRIu64
" successful request(s)", proxy->downstream_fd, proxy->req_count);
/* Reset these two for the next session */
proxy->req_count = 0;
proxy->hello_sent = 0;
return;
}
/** Accept an NBD socket connection, dispatch appropriately */
int proxy_accept(struct proxier *params)
{
NULLCHECK(params);
int client_fd;
fd_set fds;
union mysockaddr client_address;
socklen_t socklen = sizeof(client_address);
info("Waiting for client connection");
FD_ZERO(&fds);
FD_SET(params->listen_fd, &fds);
FATAL_IF_NEGATIVE(sock_try_select(FD_SETSIZE, &fds, NULL, NULL, NULL),
SHOW_ERRNO("select() failed")
);
if (FD_ISSET(params->listen_fd, &fds)) {
client_fd =
accept(params->listen_fd, &client_address.generic, &socklen);
if (client_address.family != AF_UNIX) {
if (sock_set_tcp_nodelay(client_fd, 1) == -1) {
warn(SHOW_ERRNO("Failed to set TCP_NODELAY"));
}
}
info("Accepted nbd client socket fd %d", client_fd);
sock_set_nonblock(client_fd, 1);
params->downstream_fd = client_fd;
proxy_session(params);
WARN_IF_NEGATIVE(sock_try_close(params->downstream_fd),
"Couldn't close() downstram fd %i after proxy session",
params->downstream_fd);
params->downstream_fd = -1;
}
return 1; /* We actually expect to be interrupted by signal handlers */
}
void proxy_accept_loop(struct proxier *params)
{
NULLCHECK(params);
while (proxy_accept(params));
}
/** Closes sockets */
void proxy_cleanup(struct proxier *proxy)
{
NULLCHECK(proxy);
info("Cleaning up");
if (-1 != proxy->listen_fd) {
if (AF_UNIX == proxy->listen_on.family) {
if (-1 == unlink(proxy->listen_on.un.sun_path)) {
warn(SHOW_ERRNO
("Failed to unlink %s",
proxy->listen_on.un.sun_path));
}
}
WARN_IF_NEGATIVE(sock_try_close(proxy->listen_fd),
SHOW_ERRNO("Failed to close() listen fd %i",
proxy->listen_fd)
);
proxy->listen_fd = -1;
}
if (-1 != proxy->downstream_fd) {
WARN_IF_NEGATIVE(sock_try_close(proxy->downstream_fd),
SHOW_ERRNO("Failed to close() downstream fd %i",
proxy->downstream_fd)
);
proxy->downstream_fd = -1;
}
if (-1 != proxy->upstream_fd) {
WARN_IF_NEGATIVE(sock_try_close(proxy->upstream_fd),
SHOW_ERRNO("Failed to close() upstream fd %i",
proxy->upstream_fd)
);
proxy->upstream_fd = -1;
}
info("Cleanup done");
}
/** Full lifecycle of the proxier */
int do_proxy(struct proxier *params)
{
NULLCHECK(params);
info("Ensuring upstream server is open");
if (!proxy_connect_to_upstream(params)) {
warn("Couldn't connect to upstream server during initialization, exiting");
proxy_cleanup(params);
return 1;
};
proxy_open_listen_socket(params);
proxy_accept_loop(params);
proxy_cleanup(params);
return 0;
}

97
src/proxy/proxy.h Normal file
View File

@@ -0,0 +1,97 @@
#ifndef PROXY_H
#define PROXY_H
#include <sys/types.h>
#include <unistd.h>
#include "ioutil.h"
#include "parse.h"
#include "nbdtypes.h"
#include "self_pipe.h"
#ifdef PREFETCH
#include "prefetch.h"
#endif
/** UPSTREAM_TIMEOUT
* How long ( in ms ) to allow for upstream to respond. If it takes longer
* than this, we will cancel the current request-response to them and resubmit
*/
#define UPSTREAM_TIMEOUT 30 * 1000
struct proxier {
/** address/port to bind to */
union mysockaddr listen_on;
/** address/port to connect to */
union mysockaddr connect_to;
/** address to bind to when making outgoing connections */
union mysockaddr connect_from;
int bind; /* Set to true if we should use it */
/* The socket we listen() on and accept() against */
int listen_fd;
/* The socket returned by accept() that we receive requests from and send
* responses to
*/
int downstream_fd;
/* The socket returned by connect() that we send requests to and receive
* responses from
*/
int upstream_fd;
/* This is the size we advertise to the downstream server */
uint64_t upstream_size;
/* These are the transmission flags sent as part of the handshake */
uint32_t upstream_flags;
/* We transform the raw request header into here */
struct nbd_request req_hdr;
/* We transform the raw reply header into here */
struct nbd_reply rsp_hdr;
/* Used for our non-blocking negotiation with upstream. TODO: maybe use
* for downstream as well ( we currently overload rsp ) */
struct iobuf init;
/* The current NBD request from downstream */
struct iobuf req;
/* The current NBD reply from upstream */
struct iobuf rsp;
/* It's starting to feel like we need an object for a single proxy session.
* These two track how many requests we've sent so far, and whether the
* NBD_INIT code has been sent to the client yet.
*/
uint64_t req_count;
int hello_sent;
/** These are only used if we pass --cache on the command line */
/* While the in-flight request has been munged by prefetch, these two are
* set to true, and the original length of the request, respectively */
int is_prefetch_req;
uint32_t prefetch_req_orig_len;
/* And here, we actually store the prefetched data once it's returned */
struct prefetch *prefetch;
/** */
};
struct proxier *proxy_create(char *s_downstream_address,
char *s_downstream_port,
char *s_upstream_address,
char *s_upstream_port,
char *s_upstream_bind, char *s_cache_bytes);
int do_proxy(struct proxier *proxy);
void proxy_cleanup(struct proxier *proxy);
void proxy_destroy(struct proxier *proxy);
#endif

View File

@@ -1,250 +0,0 @@
#include "nbdtypes.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "serve.h"
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
int socket_connect(struct sockaddr* to, struct sockaddr* from)
{
int fd = socket(to->sa_family == AF_INET ? PF_INET : PF_INET6, SOCK_STREAM, 0);
if( fd < 0 ){
warn( "Couldn't create client socket");
return -1;
}
if (NULL != from) {
if ( 0 > bind( fd, from, sizeof(struct sockaddr_in6 ) ) ){
warn( SHOW_ERRNO( "bind() to source address failed" ) );
if ( 0 > close( fd ) ) { /* Non-fatal leak */
warn( SHOW_ERRNO( "Failed to close fd %i", fd ) );
}
return -1;
}
}
if ( 0 > sock_try_connect( fd, to, sizeof( struct sockaddr_in6 ), 15 ) ) {
warn( SHOW_ERRNO( "connect failed" ) );
if ( 0 > close( fd ) ) { /* Non-fatal leak */
warn( SHOW_ERRNO( "Failed to close fd %i", fd ) );
}
return -1;
}
if ( sock_set_tcp_nodelay( fd, 1 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to set TCP_NODELAY" ) );
}
return fd;
}
int nbd_check_hello( struct nbd_init_raw* init_raw, off64_t* out_size )
{
if ( strncmp( init_raw->passwd, INIT_PASSWD, 8 ) != 0 ) {
warn( "wrong passwd" );
goto fail;
}
if ( be64toh( init_raw->magic ) != INIT_MAGIC ) {
warn( "wrong magic (%x)", be64toh( init_raw->magic ) );
goto fail;
}
if ( NULL != out_size ) {
*out_size = be64toh( init_raw->size );
}
return 1;
fail:
return 0;
}
int socket_nbd_read_hello( int fd, off64_t* out_size )
{
struct nbd_init_raw init_raw;
if ( 0 > readloop( fd, &init_raw, sizeof(init_raw) ) ) {
warn( "Couldn't read init" );
return 0;
}
return nbd_check_hello( &init_raw, out_size );
}
void nbd_hello_to_buf( struct nbd_init_raw *buf, off64_t out_size )
{
struct nbd_init init;
memcpy( &init.passwd, INIT_PASSWD, 8 );
init.magic = INIT_MAGIC;
init.size = out_size;
memset( buf, 0, sizeof( struct nbd_init_raw ) ); // ensure reserved is 0s
nbd_h2r_init( &init, buf );
return;
}
int socket_nbd_write_hello(int fd, off64_t out_size)
{
struct nbd_init_raw init_raw;
nbd_hello_to_buf( &init_raw, out_size );
if ( 0 > writeloop( fd, &init_raw, sizeof( init_raw ) ) ) {
warn( SHOW_ERRNO( "failed to write hello to socket" ) );
return 0;
}
return 1;
}
void fill_request(struct nbd_request *request, int type, off64_t from, int len)
{
request->magic = htobe32(REQUEST_MAGIC);
request->type = htobe32(type);
((int*) request->handle)[0] = rand();
((int*) request->handle)[1] = rand();
request->from = htobe64(from);
request->len = htobe32(len);
}
void read_reply(int fd, struct nbd_request *request, struct nbd_reply *reply)
{
struct nbd_reply_raw reply_raw;
ERROR_IF_NEGATIVE(readloop(fd, &reply_raw, sizeof(struct nbd_reply_raw)),
"Couldn't read reply");
nbd_r2h_reply( &reply_raw, reply );
if (reply->magic != REPLY_MAGIC) {
error("Reply magic incorrect (%x)", reply->magic);
}
if (reply->error != 0) {
error("Server replied with error %d", reply->error);
}
if (strncmp(request->handle, reply->handle, 8) != 0) {
error("Did not reply with correct handle");
}
}
void wait_for_data( int fd, int timeout_secs )
{
fd_set fds;
struct timeval tv = { timeout_secs, 0 };
int selected;
FD_ZERO( &fds );
FD_SET( fd, &fds );
selected = sock_try_select(
FD_SETSIZE, &fds, NULL, NULL, timeout_secs >=0 ? &tv : NULL
);
FATAL_IF( -1 == selected, "Select failed" );
ERROR_IF( 0 == selected, "Timed out waiting for reply" );
}
void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs)
{
struct nbd_request request;
struct nbd_reply reply;
fill_request(&request, REQUEST_READ, from, len);
FATAL_IF_NEGATIVE(writeloop(fd, &request, sizeof(request)),
"Couldn't write request");
wait_for_data( fd, timeout_secs );
read_reply(fd, &request, &reply);
if (out_buf) {
FATAL_IF_NEGATIVE(readloop(fd, out_buf, len),
"Read failed");
}
else {
FATAL_IF_NEGATIVE(
splice_via_pipe_loop(fd, out_fd, len),
"Splice failed"
);
}
}
void socket_nbd_write(int fd, off64_t from, int len, int in_fd, void* in_buf, int timeout_secs)
{
struct nbd_request request;
struct nbd_reply reply;
fill_request(&request, REQUEST_WRITE, from, len);
ERROR_IF_NEGATIVE(writeloop(fd, &request, sizeof(request)),
"Couldn't write request");
if (in_buf) {
ERROR_IF_NEGATIVE(writeloop(fd, in_buf, len),
"Write failed");
}
else {
ERROR_IF_NEGATIVE(
splice_via_pipe_loop(in_fd, fd, len),
"Splice failed"
);
}
wait_for_data( fd, timeout_secs );
read_reply(fd, &request, &reply);
}
int socket_nbd_disconnect( int fd )
{
int success = 1;
struct nbd_request request;
fill_request( &request, REQUEST_DISCONNECT, 0, 0 );
/* FIXME: This shouldn't be a FATAL error. We should just drop
* the mirror without affecting the main server.
*/
FATAL_IF_NEGATIVE( writeloop( fd, &request, sizeof( request ) ),
"Failed to write the disconnect request." );
return success;
}
#define CHECK_RANGE(error_type) { \
off64_t size;\
int success = socket_nbd_read_hello(params->client, &size); \
if ( success ) {\
if (params->from < 0 || (params->from + params->len) > size) {\
fatal(error_type \
" request %d+%d is out of range given size %d", \
params->from, params->len, size\
);\
}\
}\
else {\
fatal( error_type " connection failed." );\
}\
}
void do_read(struct mode_readwrite_params* params)
{
params->client = socket_connect(&params->connect_to.generic, &params->connect_from.generic);
FATAL_IF_NEGATIVE( params->client, "Couldn't connect." );
CHECK_RANGE("read");
socket_nbd_read(params->client, params->from, params->len,
params->data_fd, NULL, 10);
close(params->client);
}
void do_write(struct mode_readwrite_params* params)
{
params->client = socket_connect(&params->connect_to.generic, &params->connect_from.generic);
FATAL_IF_NEGATIVE( params->client, "Couldn't connect." );
CHECK_RANGE("write");
socket_nbd_write(params->client, params->from, params->len,
params->data_fd, NULL, 10);
close(params->client);
}

View File

@@ -1,23 +0,0 @@
#ifndef READWRITE_H
#define READWRITE_H
#include <sys/types.h>
#include <sys/socket.h>
#include "nbdtypes.h"
int socket_connect(struct sockaddr* to, struct sockaddr* from);
int socket_nbd_read_hello(int fd, off64_t * size);
int socket_nbd_write_hello(int fd, off64_t size);
void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs);
void socket_nbd_write(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs);
int socket_nbd_disconnect( int fd );
/* as you can see, we're slowly accumulating code that should really be in an
* NBD library */
void nbd_hello_to_buf( struct nbd_init_raw* buf, off64_t out_size );
int nbd_check_hello( struct nbd_init_raw* init_raw, off64_t* out_size );
#endif

View File

@@ -1,69 +0,0 @@
#include "ioutil.h"
#include "util.h"
#include <stdlib.h>
#include <sys/un.h>
static const int max_response=1024;
void print_response( const char * response )
{
char * response_text;
FILE * out;
int exit_status;
NULLCHECK( response );
exit_status = atoi(response);
response_text = strchr( response, ':' );
FATAL_IF_NULL( response_text,
"Error parsing server response: '%s'", response );
out = exit_status > 0 ? stderr : stdout;
fprintf(out, "%s\n", response_text + 2);
}
void do_remote_command(char* command, char* socket_name, int argc, char** argv)
{
char newline=10;
int i;
debug( "connecting to run remote command %s", command );
int remote = socket(AF_UNIX, SOCK_STREAM, 0);
struct sockaddr_un address;
char response[max_response];
memset(&address, 0, sizeof(address));
FATAL_IF_NEGATIVE(remote, "Couldn't create client socket");
address.sun_family = AF_UNIX;
strncpy(address.sun_path, socket_name, sizeof(address.sun_path));
FATAL_IF_NEGATIVE(
connect(remote, (struct sockaddr*) &address, sizeof(address)),
"Couldn't connect to %s", socket_name
);
write(remote, command, strlen(command));
write(remote, &newline, 1);
for (i=0; i<argc; i++) {
if ( NULL != argv[i] ) {
write(remote, argv[i], strlen(argv[i]));
}
write(remote, &newline, 1);
}
write(remote, &newline, 1);
FATAL_IF_NEGATIVE(
read_until_newline(remote, response, max_response),
"Couldn't read response from %s", socket_name
);
print_response( response );
exit(atoi(response));
close(remote);
}

View File

@@ -1,148 +0,0 @@
/**
* self_pipe.c
*
* author: Alex Young <alex@bytemark.co.uk>
*
* Wrapper for the self-pipe trick for select()-based thread
* synchronisation. Get yourself a self_pipe with self_pipe_create(),
* select() on the read end of the pipe with the help of
* self_pipe_fd_set( sig, fds ) and self_pipe_fd_isset( sig, fds ).
* When you've received a signal, clear it with
* self_pipe_signal_clear(sig) so that the buffer doesn't get filled.
*
*/
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include "util.h"
#include "self_pipe.h"
#define ERR_MSG_PIPE "Couldn't open a pipe for signaling."
#define ERR_MSG_FCNTL "Couldn't set a signalling pipe non-blocking."
#define ERR_MSG_WRITE "Couldn't write to a signaling pipe."
#define ERR_MSG_READ "Couldn't read from a signaling pipe."
void self_pipe_server_error( int err, char *msg )
{
char errbuf[1024] = {0};
strerror_r( err, errbuf, 1024 );
fatal( "%s\t%d (%s)", msg, err, errbuf );
}
/**
* Allocate a struct self_pipe, opening the pipe.
*
* Returns NULL if the pipe couldn't be opened or if we couldn't set it
* non-blocking.
*
* Remember to call self_pipe_destroy when you're done with the return
* value.
*/
struct self_pipe * self_pipe_create(void)
{
struct self_pipe *sig = xmalloc( sizeof( struct self_pipe ) );
int fds[2];
int fcntl_err;
if ( NULL == sig ) { return NULL; }
if ( pipe( fds ) ) {
free( sig );
self_pipe_server_error( errno, ERR_MSG_PIPE );
return NULL;
}
if ( fcntl( fds[0], F_SETFL, O_NONBLOCK ) || fcntl( fds[1], F_SETFL, O_NONBLOCK ) ) {
fcntl_err = errno;
while( close( fds[0] ) == -1 && errno == EINTR );
while( close( fds[1] ) == -1 && errno == EINTR );
free( sig );
self_pipe_server_error( fcntl_err, ERR_MSG_FCNTL );
return NULL;
}
sig->read_fd = fds[0];
sig->write_fd = fds[1];
return sig;
}
/**
* Send a signal to anyone select()ing on this signal.
*
* Returns 1 on success. Can fail if weirdness happened to the write fd
* of the pipe in the self_pipe struct.
*/
int self_pipe_signal( struct self_pipe * sig )
{
NULLCHECK( sig );
FATAL_IF( 1 == sig->write_fd, "Shouldn't be writing to stdout" );
FATAL_IF( 2 == sig->write_fd, "Shouldn't be writing to stderr" );
int written = write( sig->write_fd, "X", 1 );
if ( written != 1 ) {
self_pipe_server_error( errno, ERR_MSG_WRITE );
return 0;
}
return 1;
}
/**
* Clear a received signal from the pipe. Every signal sent must be
* cleared by one (and only one) recipient when they return from select()
* if the signal is to be used more than once.
* Returns the number of bytes read, which will be 1 on success and 0 if
* there was no signal.
*/
int self_pipe_signal_clear( struct self_pipe *sig )
{
char buf[1];
return 1 == read( sig->read_fd, buf, 1 );
}
/**
* Close the pipe and free the self_pipe. Do not try to use the
* self_pipe struct after calling this, the innards are mush.
*/
int self_pipe_destroy( struct self_pipe * sig )
{
NULLCHECK(sig);
while( close( sig->read_fd ) == -1 && errno == EINTR );
while( close( sig->write_fd ) == -1 && errno == EINTR );
/* Just in case anyone *does* try to use this after free,
* we should set the memory locations to an error value
*/
sig->read_fd = -1;
sig->write_fd = -1;
free( sig );
return 1;
}
int self_pipe_fd_set( struct self_pipe * sig, fd_set * fds)
{
FD_SET( sig->read_fd, fds );
return 1;
}
int self_pipe_fd_isset( struct self_pipe * sig, fd_set * fds)
{
return FD_ISSET( sig->read_fd, fds );
}

View File

@@ -1,19 +0,0 @@
#ifndef SELF_PIPE_H
#define SELF_PIPE_H
#include <sys/select.h>
struct self_pipe {
int read_fd;
int write_fd;
};
struct self_pipe * self_pipe_create(void);
int self_pipe_signal( struct self_pipe * sig );
int self_pipe_signal_clear( struct self_pipe *sig );
int self_pipe_destroy( struct self_pipe * sig );
int self_pipe_fd_set( struct self_pipe * sig, fd_set * fds );
int self_pipe_fd_isset( struct self_pipe *sig, fd_set *fds );
#endif

View File

@@ -1,944 +0,0 @@
#include "serve.h"
#include "client.h"
#include "nbdtypes.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "bitset.h"
#include "control.h"
#include "self_pipe.h"
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/un.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
struct server * server_create (
struct flexnbd * flexnbd,
char* s_ip_address,
char* s_port,
char* s_file,
int default_deny,
int acl_entries,
char** s_acl_entries,
int max_nbd_clients,
int use_killswitch,
int success)
{
NULLCHECK( flexnbd );
struct server * out;
out = xmalloc( sizeof( struct server ) );
out->flexnbd = flexnbd;
out->success = success;
out->max_nbd_clients = max_nbd_clients;
out->use_killswitch = use_killswitch;
server_allow_new_clients( out );
out->nbd_client = xmalloc( max_nbd_clients * sizeof( struct client_tbl_entry ) );
out->tcp_backlog = 10; /* does this need to be settable? */
FATAL_IF_NULL(s_ip_address, "No IP address supplied");
FATAL_IF_NULL(s_port, "No port number supplied");
FATAL_IF_NULL(s_file, "No filename supplied");
NULLCHECK( s_ip_address );
FATAL_IF_ZERO(
parse_ip_to_sockaddr(&out->bind_to.generic, s_ip_address),
"Couldn't parse server address '%s' (use 0 if "
"you want to bind to all IPs)",
s_ip_address
);
out->acl = acl_create( acl_entries, s_acl_entries, default_deny );
if (out->acl && out->acl->len != acl_entries) {
fatal("Bad ACL entry '%s'", s_acl_entries[out->acl->len]);
}
parse_port( s_port, &out->bind_to.v4 );
out->filename = s_file;
out->l_acl = flexthread_mutex_create();
out->l_start_mirror = flexthread_mutex_create();
out->mirror_can_start = 1;
out->close_signal = self_pipe_create();
out->acl_updated_signal = self_pipe_create();
NULLCHECK( out->close_signal );
NULLCHECK( out->acl_updated_signal );
return out;
}
void server_destroy( struct server * serve )
{
self_pipe_destroy( serve->acl_updated_signal );
serve->acl_updated_signal = NULL;
self_pipe_destroy( serve->close_signal );
serve->close_signal = NULL;
flexthread_mutex_destroy( serve->l_start_mirror );
flexthread_mutex_destroy( serve->l_acl );
if ( serve->acl ) {
acl_destroy( serve->acl );
serve->acl = NULL;
}
free( serve->nbd_client );
free( serve );
}
void server_unlink( struct server * serve )
{
NULLCHECK( serve );
NULLCHECK( serve->filename );
FATAL_IF_NEGATIVE( unlink( serve->filename ),
"Failed to unlink %s: %s",
serve->filename,
strerror( errno ) );
}
#define SERVER_LOCK( s, f, msg ) \
do { NULLCHECK( s ); \
FATAL_IF( 0 != flexthread_mutex_lock( s->f ), msg ); } while (0)
#define SERVER_UNLOCK( s, f, msg ) \
do { NULLCHECK( s ); \
FATAL_IF( 0 != flexthread_mutex_unlock( s->f ), msg ); } while (0)
void server_lock_acl( struct server *serve )
{
debug("ACL locking");
SERVER_LOCK( serve, l_acl, "Problem with ACL lock" );
}
void server_unlock_acl( struct server *serve )
{
debug( "ACL unlocking" );
SERVER_UNLOCK( serve, l_acl, "Problem with ACL unlock" );
}
int server_acl_locked( struct server * serve )
{
NULLCHECK( serve );
return flexthread_mutex_held( serve->l_acl );
}
void server_lock_start_mirror( struct server *serve )
{
debug("Mirror start locking");
SERVER_LOCK( serve, l_start_mirror, "Problem with start mirror lock" );
}
void server_unlock_start_mirror( struct server *serve )
{
debug("Mirror start unlocking");
SERVER_UNLOCK( serve, l_start_mirror, "Problem with start mirror unlock" );
}
int server_start_mirror_locked( struct server * serve )
{
NULLCHECK( serve );
return flexthread_mutex_held( serve->l_start_mirror );
}
/** Return the actual port the server bound to. This is used because we
* are allowed to pass "0" on the command-line.
*/
int server_port( struct server * server )
{
NULLCHECK( server );
union mysockaddr addr;
socklen_t len = sizeof( addr.v4 );
if ( getsockname( server->server_fd, &addr.v4, &len ) < 0 ) {
fatal( "Failed to get the port number." );
}
return be16toh( addr.v4.sin_port );
}
/** Prepares a listening socket for the NBD server, binding etc. */
void serve_open_server_socket(struct server* params)
{
NULLCHECK( params );
params->server_fd = socket(params->bind_to.generic.sa_family == AF_INET ?
PF_INET : PF_INET6, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE( params->server_fd, "Couldn't create server socket" );
/* We need SO_REUSEADDR so that when we switch from listening to
* serving we don't have to change address if we don't want to.
*
* If this fails, it's not necessarily bad in principle, but at
* this point in the code we can't tell if it's going to be a
* problem. It's also indicative of something odd going on, so
* we barf.
*/
FATAL_IF_NEGATIVE(
sock_set_reuseaddr( params->server_fd, 1 ), "Couldn't set SO_REUSEADDR"
);
/* TCP_NODELAY makes everything not be slow. If we can't set
* this, again, there's something odd going on which we don't
* understand.
*/
FATAL_IF_NEGATIVE(
sock_set_tcp_nodelay( params->server_fd, 1 ), "Couldn't set TCP_NODELAY"
);
/* If we can't bind, presumably that's because someone else is
* squatting on our ip/port combo, or the ip isn't yet
* configured. Ideally we want to retry this. */
FATAL_UNLESS_ZERO(
sock_try_bind( params->server_fd, &params->bind_to.generic ),
SHOW_ERRNO( "Failed to bind() socket" )
);
FATAL_IF_NEGATIVE(
listen(params->server_fd, params->tcp_backlog),
"Couldn't listen on server socket"
);
}
int tryjoin_client_thread( struct client_tbl_entry *entry, int (*joinfunc)(pthread_t, void **) )
{
NULLCHECK( entry );
NULLCHECK( joinfunc );
int was_closed = 0;
void * status=NULL;
int join_errno;
if (entry->thread != 0) {
char s_client_address[128];
sockaddr_address_string( &entry->address.generic, &s_client_address[0], 128 );
debug( "%s(%p,...)", joinfunc == pthread_join ? "joining" : "tryjoining", entry->thread );
join_errno = joinfunc(entry->thread, &status);
/* join_errno can legitimately be ESRCH if the thread is
* already dead, but the client still needs tidying up. */
if (join_errno != 0 && !entry->client->stopped ) {
debug( "join_errno was %s, stopped was %d", strerror( join_errno ), entry->client->stopped );
FATAL_UNLESS( join_errno == EBUSY,
"Problem with joining thread %p: %s",
entry->thread,
strerror(join_errno) );
}
else if ( join_errno == 0 ) {
debug("nbd thread %016x exited (%s) with status %ld",
entry->thread,
s_client_address,
(uint64_t)status);
client_destroy( entry->client );
entry->client = NULL;
entry->thread = 0;
was_closed = 1;
}
}
return was_closed;
}
/**
* Check to see if a client thread has finished, and if so, tidy up
* after it.
* Returns 1 if the thread was cleaned up and the slot freed, 0
* otherwise.
*
* It's important that client_destroy gets called in the same thread
* which signals the client threads to stop. This avoids the
* possibility of sending a stop signal via a signal which has already
* been destroyed. However, it means that stopped client threads,
* including their signal pipes, won't be cleaned up until the next new
* client connection attempt.
*/
int cleanup_client_thread( struct client_tbl_entry * entry )
{
return tryjoin_client_thread( entry, pthread_tryjoin_np );
}
void cleanup_client_threads( struct client_tbl_entry * entries, size_t entries_len )
{
size_t i;
for( i = 0; i < entries_len; i++ ) {
cleanup_client_thread( &entries[i] );
}
}
/**
* Join a client thread after having sent a stop signal to it.
* This function will not return until pthread_join has returned, so
* ensures that the client thread is dead.
*/
int join_client_thread( struct client_tbl_entry *entry )
{
return tryjoin_client_thread( entry, pthread_join );
}
/** We can only accommodate MAX_NBD_CLIENTS connections at once. This function
* goes through the current list, waits for any threads that have finished
* and returns the next slot free (or -1 if there are none).
*/
int cleanup_and_find_client_slot(struct server* params)
{
NULLCHECK( params );
int slot=-1, i;
cleanup_client_threads( params->nbd_client, params->max_nbd_clients );
for ( i = 0; i < params->max_nbd_clients; i++ ) {
if( params->nbd_client[i].thread == 0 && slot == -1 ){
slot = i;
break;
}
}
return slot;
}
int server_count_clients( struct server *params )
{
NULLCHECK( params );
int i, count = 0;
for ( i = 0 ; i < params->max_nbd_clients ; i++ ) {
if ( params->nbd_client[i].thread != 0 ) {
count++;
}
}
return count;
}
/** Check whether the address client_address is allowed or not according
* to the current acl. If params->acl is NULL, the result will be 1,
* otherwise it will be the result of acl_includes().
*/
int server_acl_accepts( struct server *params, union mysockaddr * client_address )
{
NULLCHECK( params );
NULLCHECK( client_address );
struct acl * acl;
int accepted;
server_lock_acl( params );
{
acl = params->acl;
accepted = acl ? acl_includes( acl, client_address ) : 1;
}
server_unlock_acl( params );
return accepted;
}
int server_should_accept_client(
struct server * params,
union mysockaddr * client_address,
char *s_client_address,
size_t s_client_address_len )
{
NULLCHECK( params );
NULLCHECK( client_address );
NULLCHECK( s_client_address );
const char* result = sockaddr_address_string(
&client_address->generic, s_client_address, s_client_address_len
);
if ( NULL == result ) {
warn( "Rejecting client %s: Bad client_address", s_client_address );
return 0;
}
if ( !server_acl_accepts( params, client_address ) ) {
warn( "Rejecting client %s: Access control error", s_client_address );
debug( "We %s have an acl, and default_deny is %s",
(params->acl ? "do" : "do not"),
(params->acl->default_deny ? "true" : "false") );
return 0;
}
return 1;
}
int spawn_client_thread(
struct client * client_params,
pthread_t *out_thread)
{
int result = pthread_create(out_thread, NULL, client_serve, client_params);
return result;
}
/** Dispatch function for accepting an NBD connection and starting a thread
* to handle it. Rejects the connection if there is an ACL, and the far end's
* address doesn't match, or if there are too many clients already connected.
*/
void accept_nbd_client(
struct server* params,
int client_fd,
union mysockaddr* client_address)
{
NULLCHECK(params);
NULLCHECK(client_address);
struct client* client_params;
int slot;
char s_client_address[64] = {0};
if ( !server_should_accept_client( params, client_address, s_client_address, 64 ) ) {
FATAL_IF_NEGATIVE( close( client_fd ),
"Error closing client socket fd %d", client_fd );
debug("Closed client socket fd %d", client_fd);
return;
}
slot = cleanup_and_find_client_slot(params);
if (slot < 0) {
warn("too many clients to accept connection");
FATAL_IF_NEGATIVE( close( client_fd ),
"Error closing client socket fd %d", client_fd );
debug("Closed client socket fd %d", client_fd);
return;
}
info( "Client %s accepted on fd %d.", s_client_address, client_fd );
client_params = client_create( params, client_fd );
params->nbd_client[slot].client = client_params;
memcpy(&params->nbd_client[slot].address, client_address,
sizeof(union mysockaddr));
pthread_t * thread = &params->nbd_client[slot].thread;
if ( 0 != spawn_client_thread( client_params, thread ) ) {
debug( "Thread creation problem." );
client_destroy( client_params );
FATAL_IF_NEGATIVE( close(client_fd),
"Error closing client socket fd %d", client_fd );
debug("Closed client socket fd %d", client_fd);
return;
}
debug("nbd thread %p started (%s)", params->nbd_client[slot].thread, s_client_address);
}
void server_audit_clients( struct server * serve)
{
NULLCHECK( serve );
int i;
struct client_tbl_entry * entry;
/* There's an apparent race here. If the acl updates while
* we're traversing the nbd_clients array, the earlier entries
* won't have been audited against the later acl. This isn't a
* problem though, because in order to update the acl
* server_replace_acl must have been called, so the
* server_accept loop will see a second acl_updated signal as
* soon as it hits select, and a second audit will be run.
*/
for( i = 0; i < serve->max_nbd_clients; i++ ) {
entry = &serve->nbd_client[i];
if ( 0 == entry->thread ) { continue; }
if ( server_acl_accepts( serve, &entry->address ) ) { continue; }
client_signal_stop( entry->client );
}
}
int server_is_closed(struct server* serve)
{
NULLCHECK( serve );
return fd_is_closed( serve->server_fd );
}
void server_close_clients( struct server *params )
{
NULLCHECK(params);
info("closing all clients");
int i; /* , j; */
struct client_tbl_entry *entry;
for( i = 0; i < params->max_nbd_clients; i++ ) {
entry = &params->nbd_client[i];
if ( entry->thread != 0 ) {
debug( "Stop signaling client %p", entry->client );
client_signal_stop( entry->client );
}
}
/* We don't join the clients here. When we enter the final
* mirror pass, we get the IO lock, then wait for the server_fd
* to close before sending the data, to be sure that no new
* clients can be accepted which might think they've written
* to the disc. However, an existing client thread can be
* waiting for the IO lock already, so if we try to join it
* here, we deadlock.
*
* The client threads will be joined in serve_cleanup.
*
*/
}
/** Replace the current acl with a new one. The old one will be thrown
* away.
*/
void server_replace_acl( struct server *serve, struct acl * new_acl )
{
NULLCHECK(serve);
NULLCHECK(new_acl);
/* We need to lock around updates to the acl in case we try to
* destroy the old acl while checking against it.
*/
server_lock_acl( serve );
{
struct acl * old_acl = serve->acl;
serve->acl = new_acl;
/* We should always have an old_acl, but just in case... */
if ( old_acl ) { acl_destroy( old_acl ); }
}
server_unlock_acl( serve );
self_pipe_signal( serve->acl_updated_signal );
}
void server_prevent_mirror_start( struct server *serve )
{
NULLCHECK( serve );
serve->mirror_can_start = 0;
}
void server_allow_mirror_start( struct server *serve )
{
NULLCHECK( serve );
serve->mirror_can_start = 1;
}
/* Only call this with the mirror start lock held */
int server_mirror_can_start( struct server *serve )
{
NULLCHECK( serve );
return serve->mirror_can_start;
}
/* Queries to see if we are currently mirroring. If we are, we need
* to communicate that via the process exit status. because otherwise
* the supervisor will assume the migration completed.
*/
int serve_shutdown_is_graceful( struct server *params )
{
int is_mirroring = 0;
server_lock_start_mirror( params );
{
if ( server_is_mirroring( params ) ) {
is_mirroring = 1;
warn( "Stop signal received while mirroring." );
server_prevent_mirror_start( params );
}
}
server_unlock_start_mirror( params );
return !is_mirroring;
}
/** Accept either an NBD or control socket connection, dispatch appropriately */
int server_accept( struct server * params )
{
NULLCHECK( params );
debug("accept loop starting");
int client_fd;
union mysockaddr client_address;
fd_set fds;
socklen_t socklen=sizeof(client_address);
/* We select on this fd to receive OS signals (only a few of
* which we're interested in, see flexnbd.c */
int signal_fd = flexnbd_signal_fd( params->flexnbd );
int should_continue = 1;
FD_ZERO(&fds);
FD_SET(params->server_fd, &fds);
if( 0 < signal_fd ) { FD_SET(signal_fd, &fds); }
self_pipe_fd_set( params->close_signal, &fds );
self_pipe_fd_set( params->acl_updated_signal, &fds );
FATAL_IF_NEGATIVE(
sock_try_select(FD_SETSIZE, &fds, NULL, NULL, NULL),
SHOW_ERRNO( "select() failed" )
);
if ( self_pipe_fd_isset( params->close_signal, &fds ) ){
server_close_clients( params );
should_continue = 0;
}
if ( 0 < signal_fd && FD_ISSET( signal_fd, &fds ) ){
debug( "Stop signal received." );
server_close_clients( params );
params->success = params->success && serve_shutdown_is_graceful( params );
should_continue = 0;
}
if ( self_pipe_fd_isset( params->acl_updated_signal, &fds ) ) {
self_pipe_signal_clear( params->acl_updated_signal );
server_audit_clients( params );
}
if ( FD_ISSET( params->server_fd, &fds ) ){
client_fd = accept( params->server_fd, &client_address.generic, &socklen );
if ( params->allow_new_clients ) {
debug("Accepted nbd client socket fd %d", client_fd);
accept_nbd_client(params, client_fd, &client_address);
} else {
debug( "New NBD client socket %d not allowed", client_fd );
sock_try_close( client_fd );
}
}
return should_continue;
}
void serve_accept_loop(struct server* params)
{
NULLCHECK( params );
while( server_accept( params ) );
}
void* build_allocation_map_thread(void* serve_uncast)
{
NULLCHECK( serve_uncast );
struct server* serve = (struct server*) serve_uncast;
NULLCHECK( serve->filename );
NULLCHECK( serve->allocation_map );
int fd = open( serve->filename, O_RDONLY );
FATAL_IF_NEGATIVE( fd, "Couldn't open %s", serve->filename );
if ( build_allocation_map( serve->allocation_map, fd ) ) {
serve->allocation_map_built = 1;
}
else {
/* We can operate without it, but we can't free it without a race.
* All that happens if we leave it is that it gradually builds up an
* *incomplete* record of writes. Nobody will use it, as
* allocation_map_built == 0 for the lifetime of the process.
*
* The stream functionality can still be relied on. We don't need to
* worry about mirroring waiting for the allocation map to finish,
* because we already copy every byte at least once. If that changes in
* the future, we'll need to wait for the allocation map to finish or
* fail before we can complete the migration.
*/
warn( "Didn't build allocation map for %s", serve->filename );
}
close( fd );
return NULL;
}
/** Initialisation function that sets up the initial allocation map, i.e. so
* we know which blocks of the file are allocated.
*/
void serve_init_allocation_map(struct server* params)
{
NULLCHECK( params );
NULLCHECK( params->filename );
int fd = open( params->filename, O_RDONLY );
off64_t size;
FATAL_IF_NEGATIVE(fd, "Couldn't open %s", params->filename );
size = lseek64( fd, 0, SEEK_END );
params->size = size;
FATAL_IF_NEGATIVE( size, "Couldn't find size of %s",
params->filename );
params->allocation_map =
bitset_alloc( params->size, block_allocation_resolution );
int ok = pthread_create( &params->allocation_map_builder_thread,
NULL,
build_allocation_map_thread,
params );
FATAL_IF_NEGATIVE( ok, "Couldn't create thread" );
}
void server_forbid_new_clients( struct server * serve )
{
serve->allow_new_clients = 0;
return;
}
void server_allow_new_clients( struct server * serve )
{
serve->allow_new_clients = 1;
return;
}
void server_join_clients( struct server * serve ) {
int i;
void* status;
for (i=0; i < serve->max_nbd_clients; i++) {
pthread_t thread_id = serve->nbd_client[i].thread;
int err = 0;
if (thread_id != 0) {
debug( "joining thread %p", thread_id );
if ( 0 == (err = pthread_join( thread_id, &status ) ) ) {
serve->nbd_client[i].thread = 0;
} else {
warn( "Error %s (%i) joining thread %p", strerror( err ), err, thread_id );
}
}
}
return;
}
/* Tell the server to close all the things. */
void serve_signal_close( struct server * serve )
{
NULLCHECK( serve );
info("signalling close");
self_pipe_signal( serve->close_signal );
}
/* Block until the server closes the server_fd.
*/
void serve_wait_for_close( struct server * serve )
{
while( !fd_is_closed( serve->server_fd ) ){
usleep(10000);
}
}
/* We've just had an DISCONNECT pair, so we need to shut down
* and signal our listener that we can safely take over.
*/
void server_control_arrived( struct server *serve )
{
debug( "server_control_arrived" );
NULLCHECK( serve );
if ( !serve->success ) {
serve->success = 1;
serve_signal_close( serve );
}
}
void flexnbd_stop_control( struct flexnbd * flexnbd );
/** Closes sockets, frees memory and waits for all client threads to finish */
void serve_cleanup(struct server* params,
int fatal __attribute__ ((unused)) )
{
NULLCHECK( params );
void* status;
info("cleaning up");
if (params->server_fd){ close(params->server_fd); }
/* need to stop background build if we're killed very early on */
pthread_cancel(params->allocation_map_builder_thread);
pthread_join(params->allocation_map_builder_thread, &status);
int need_mirror_lock;
need_mirror_lock = !server_start_mirror_locked( params );
if ( need_mirror_lock ) { server_lock_start_mirror( params ); }
{
if ( server_is_mirroring( params ) ) {
server_abandon_mirror( params );
}
server_prevent_mirror_start( params );
}
if ( need_mirror_lock ) { server_unlock_start_mirror( params ); }
server_join_clients( params );
if (params->allocation_map) {
bitset_free( params->allocation_map );
}
if ( server_start_mirror_locked( params ) ) {
server_unlock_start_mirror( params );
}
if ( server_acl_locked( params ) ) {
server_unlock_acl( params );
}
/* if( params->flexnbd ) { */
/* if ( params->flexnbd->control ) { */
/* flexnbd_stop_control( params->flexnbd ); */
/* } */
/* flexnbd_destroy( params->flexnbd ); */
/* } */
/* server_destroy( params ); */
debug( "Cleanup done");
}
int server_is_in_control( struct server *serve )
{
NULLCHECK( serve );
return serve->success;
}
int server_is_mirroring( struct server * serve )
{
NULLCHECK( serve );
return !!serve->mirror_super;
}
uint64_t server_mirror_bytes_remaining( struct server * serve )
{
if ( server_is_mirroring( serve ) ) {
uint64_t bytes_to_xfer =
bitset_stream_queued_bytes( serve->allocation_map, BITSET_STREAM_SET ) +
( serve->size - serve->mirror->offset );
return bytes_to_xfer;
}
return 0;
}
/* Given historic bps measurements and number of bytes left to transfer, give
* an estimate of how many seconds are remaining before the migration is
* complete, assuming no new bytes are written.
*/
uint64_t server_mirror_eta( struct server * serve )
{
if ( server_is_mirroring( serve ) ) {
uint64_t bytes_to_xfer = server_mirror_bytes_remaining( serve );
return bytes_to_xfer / ( mirror_current_bps( serve->mirror ) + 1 );
}
return 0;
}
void mirror_super_destroy( struct mirror_super * super );
/* This must only be called with the start_mirror lock held */
void server_abandon_mirror( struct server * serve )
{
NULLCHECK( serve );
if ( serve->mirror_super ) {
/* FIXME: AWOOGA! RACE!
* We can set abandon_signal after mirror_super has checked it, but
* before the reset. However, mirror_reset doesn't clear abandon_signal
* so it'll just terminate early on the next pass. */
ERROR_UNLESS(
self_pipe_signal( serve->mirror->abandon_signal ),
"Failed to signal abandon to mirror"
);
pthread_t tid = serve->mirror_super->thread;
pthread_join( tid, NULL );
debug( "Mirror thread %p pthread_join returned", tid );
server_allow_mirror_start( serve );
mirror_super_destroy( serve->mirror_super );
serve->mirror = NULL;
serve->mirror_super = NULL;
debug( "Mirror supervisor done." );
}
}
int server_default_deny( struct server * serve )
{
NULLCHECK( serve );
return acl_default_deny( serve->acl );
}
/** Full lifecycle of the server */
int do_serve( struct server* params, struct self_pipe * open_signal )
{
NULLCHECK( params );
int success;
error_set_handler((cleanup_handler*) serve_cleanup, params);
serve_open_server_socket(params);
/* Only signal that we are open for business once the server
socket is open */
if ( NULL != open_signal ) { self_pipe_signal( open_signal ); }
serve_init_allocation_map(params);
serve_accept_loop(params);
success = params->success;
serve_cleanup(params, 0);
return success;
}

View File

@@ -1,162 +0,0 @@
#ifndef SERVE_H
#define SERVE_H
#include <sys/types.h>
#include <unistd.h>
#include <signal.h> /* for sig_atomic_t */
#include "flexnbd.h"
#include "parse.h"
#include "acl.h"
static const int block_allocation_resolution = 4096;//128<<10;
struct client_tbl_entry {
pthread_t thread;
union mysockaddr address;
struct client * client;
};
#define MAX_NBD_CLIENTS 16
struct server {
/* The flexnbd wrapper this server is attached to */
struct flexnbd * flexnbd;
/** address/port to bind to */
union mysockaddr bind_to;
/** (static) file name to serve */
char* filename;
/** TCP backlog for listen() */
int tcp_backlog;
/** (static) file name of UNIX control socket (or NULL if none) */
char* control_socket_name;
/** size of file */
uint64_t size;
/** to interrupt accept loop and clients, write() to close_signal[1] */
struct self_pipe * close_signal;
/** access control list */
struct acl * acl;
/** acl_updated_signal will be signalled after the acl struct
* has been replaced
*/
struct self_pipe * acl_updated_signal;
/* Claimed around any updates to the ACL. */
struct flexthread_mutex * l_acl;
/* Claimed around starting a mirror so that it doesn't race with
* shutting down on a SIGTERM. */
struct flexthread_mutex * l_start_mirror;
struct mirror* mirror;
struct mirror_super * mirror_super;
/* This is used to stop the mirror from starting after we
* receive a SIGTERM */
int mirror_can_start;
int server_fd;
int control_fd;
/* the allocation_map keeps track of which blocks in the backing file
* have been allocated, or part-allocated on disc, with unallocated
* blocks presumed to contain zeroes (i.e. represented as sparse files
* by the filesystem). We can use this information when receiving
* incoming writes, and avoid writing zeroes to unallocated sections
* of the file which would needlessly increase disc usage. This
* bitmap will start at all-zeroes for an empty file, and tend towards
* all-ones as the file is written to (i.e. we assume that allocated
* blocks can never become unallocated again, as is the case with ext3
* at least).
*/
struct bitset * allocation_map;
/* when starting up, this thread builds the allocation_map */
pthread_t allocation_map_builder_thread;
/* when the thread has finished, it sets this to 1 */
volatile sig_atomic_t allocation_map_built;
int max_nbd_clients;
struct client_tbl_entry *nbd_client;
/** Should clients use the killswitch? */
int use_killswitch;
/** If this isn't set, newly accepted clients will be closed immediately */
int allow_new_clients;
/* Marker for whether this server has control over the data in
* the file, or if we're waiting to receive it from an inbound
* migration which hasn't yet finished.
*
* It's the value which controls the exit status of a serve or
* listen process.
*/
int success;
};
struct server * server_create(
struct flexnbd * flexnbd,
char* s_ip_address,
char* s_port,
char* s_file,
int default_deny,
int acl_entries,
char** s_acl_entries,
int max_nbd_clients,
int use_killswitch,
int success );
void server_destroy( struct server * );
int server_is_closed(struct server* serve);
void serve_signal_close( struct server *serve );
void serve_wait_for_close( struct server * serve );
void server_replace_acl( struct server *serve, struct acl * acl);
void server_control_arrived( struct server *serve );
int server_is_in_control( struct server *serve );
int server_default_deny( struct server * serve );
int server_acl_locked( struct server * serve );
void server_lock_acl( struct server *serve );
void server_unlock_acl( struct server *serve );
void server_lock_start_mirror( struct server *serve );
void server_unlock_start_mirror( struct server *serve );
int server_is_mirroring( struct server * serve );
uint64_t server_mirror_bytes_remaining( struct server * serve );
uint64_t server_mirror_eta( struct server * serve );
void server_abandon_mirror( struct server * serve );
void server_prevent_mirror_start( struct server *serve );
void server_allow_mirror_start( struct server *serve );
int server_mirror_can_start( struct server *serve );
/* These three functions are used by mirror around the final pass, to close
* existing clients and prevent new ones from being around
*/
void server_forbid_new_clients( struct server *serve );
void server_close_clients( struct server *serve );
void server_join_clients( struct server *serve );
void server_allow_new_clients( struct server *serve );
/* Returns a count (ish) of the number of currently-running client threads */
int server_count_clients( struct server *params );
void server_unlink( struct server * serve );
int do_serve( struct server *, struct self_pipe * );
struct mode_readwrite_params {
union mysockaddr connect_to;
union mysockaddr connect_from;
off64_t from;
off64_t len;
int data_fd;
int client;
};
#endif

109
src/server/acl.c Normal file
View File

@@ -0,0 +1,109 @@
#include <stdlib.h>
#include "util.h"
#include "parse.h"
#include "acl.h"
struct acl *acl_create(int len, char **lines, int default_deny)
{
struct acl *acl;
acl = (struct acl *) xmalloc(sizeof(struct acl));
acl->len = parse_acl(&acl->entries, len, lines);
acl->default_deny = default_deny;
return acl;
}
static int testmasks[9] = { 0, 128, 192, 224, 240, 248, 252, 254, 255 };
/** Test whether AF_INET or AF_INET6 sockaddr is included in the given access
* control list, returning 1 if it is, and 0 if not.
*/
static int is_included_in_acl(int list_length,
struct ip_and_mask (*list)[],
union mysockaddr *test)
{
NULLCHECK(test);
int i;
for (i = 0; i < list_length; i++) {
struct ip_and_mask *entry = &(*list)[i];
int testbits;
unsigned char *raw_address1 = NULL, *raw_address2 = NULL;
debug("checking acl entry %d (%d/%d)", i, test->generic.sa_family,
entry->ip.family);
if (test->generic.sa_family != entry->ip.family) {
continue;
}
if (test->generic.sa_family == AF_INET) {
debug("it's an AF_INET");
raw_address1 = (unsigned char *) &test->v4.sin_addr;
raw_address2 = (unsigned char *) &entry->ip.v4.sin_addr;
} else if (test->generic.sa_family == AF_INET6) {
debug("it's an AF_INET6");
raw_address1 = (unsigned char *) &test->v6.sin6_addr;
raw_address2 = (unsigned char *) &entry->ip.v6.sin6_addr;
} else {
fatal("Can't check an ACL for this address type.");
}
debug("testbits=%d", entry->mask);
for (testbits = entry->mask; testbits > 0; testbits -= 8) {
debug("testbits=%d, c1=%02x, c2=%02x", testbits,
raw_address1[0], raw_address2[0]);
if (testbits >= 8) {
if (raw_address1[0] != raw_address2[0]) {
goto no_match;
}
} else {
if ((raw_address1[0] & testmasks[testbits % 8]) !=
(raw_address2[0] & testmasks[testbits % 8])) {
goto no_match;
}
}
raw_address1++;
raw_address2++;
}
return 1;
no_match:;
debug("no match");
}
return 0;
}
int acl_includes(struct acl *acl, union mysockaddr *addr)
{
NULLCHECK(acl);
if (0 == acl->len) {
return !(acl->default_deny);
} else {
return is_included_in_acl(acl->len, acl->entries, addr);
}
}
int acl_default_deny(struct acl *acl)
{
NULLCHECK(acl);
return acl->default_deny;
}
void acl_destroy(struct acl *acl)
{
free(acl->entries);
acl->len = 0;
acl->entries = NULL;
free(acl);
}

View File

@@ -4,9 +4,9 @@
#include "parse.h"
struct acl {
int len;
int default_deny;
struct ip_and_mask (*entries)[];
int len;
int default_deny;
struct ip_and_mask (*entries)[];
};
/** Allocate a new acl structure, parsing the given lines to sockaddr
@@ -17,21 +17,21 @@ struct acl {
* default_deny controls the behaviour of an empty list: if true, all
* requests will be denied. If true, all requests will be accepted.
*/
struct acl * acl_create( int len, char **lines, int default_deny );
struct acl *acl_create(int len, char **lines, int default_deny);
/** Check to see whether an address is allowed by an acl.
* See acl_create for how the default_deny setting affects this.
*/
int acl_includes( struct acl *, union mysockaddr *);
int acl_includes(struct acl *, union mysockaddr *);
/** Get the default_deny status */
int acl_default_deny( struct acl * );
int acl_default_deny(struct acl *);
/** Free the acl structure and the internal acl entries table.
*/
void acl_destroy( struct acl * );
void acl_destroy(struct acl *);
#endif

441
src/server/bitset.h Normal file
View File

@@ -0,0 +1,441 @@
#ifndef BITSET_H
#define BITSET_H
#include "util.h"
#include <inttypes.h>
#include <string.h>
#include <pthread.h>
/*
* Make the bitfield words 'opaque' to prevent code
* poking at the bits directly without using these
* accessors/macros
*/
typedef uint64_t bitfield_word_t;
typedef bitfield_word_t *bitfield_p;
#define BITFIELD_WORD_SIZE sizeof(bitfield_word_t)
#define BITS_PER_WORD (BITFIELD_WORD_SIZE * 8)
#define BIT_MASK(_idx) \
(1LL << ((_idx) & (BITS_PER_WORD - 1)))
#define BIT_WORD(_b, _idx) \
((bitfield_word_t*)(_b))[(_idx) / BITS_PER_WORD]
/* Calculates the number of words needed to store _bytes number of bytes
* this is added to accommodate code that wants to use bytes sizes
*/
#define BIT_WORDS_FOR_SIZE(_bytes) \
((_bytes + (BITFIELD_WORD_SIZE-1)) / BITFIELD_WORD_SIZE)
/** Return the bit value ''idx'' in array ''b'' */
static inline int bit_get(bitfield_p b, uint64_t idx)
{
return (BIT_WORD(b, idx) >> (idx & (BITS_PER_WORD - 1))) & 1;
}
/** Return 1 if the bit at ''idx'' in array ''b'' is set */
static inline int bit_is_set(bitfield_p b, uint64_t idx)
{
return bit_get(b, idx);
}
/** Return 1 if the bit at ''idx'' in array ''b'' is clear */
static inline int bit_is_clear(bitfield_p b, uint64_t idx)
{
return !bit_get(b, idx);
}
/** Tests whether the bit at ''idx'' in array ''b'' has value ''value'' */
static inline int bit_has_value(bitfield_p b, uint64_t idx, int value)
{
return bit_get(b, idx) == ! !value;
}
/** Sets the bit ''idx'' in array ''b'' */
static inline void bit_set(bitfield_p b, uint64_t idx)
{
BIT_WORD(b, idx) |= BIT_MASK(idx);
}
/** Clears the bit ''idx'' in array ''b'' */
static inline void bit_clear(bitfield_p b, uint64_t idx)
{
BIT_WORD(b, idx) &= ~BIT_MASK(idx);
}
/** Sets ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_set_range(bitfield_p b, uint64_t from, uint64_t len)
{
for (; (from % BITS_PER_WORD) != 0 && len > 0; len--) {
bit_set(b, from++);
}
if (len >= BITS_PER_WORD) {
memset(&BIT_WORD(b, from), 0xff, len / 8);
from += len;
len = len % BITS_PER_WORD;
from -= len;
}
for (; len > 0; len--) {
bit_set(b, from++);
}
}
/** Clears ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_clear_range(bitfield_p b, uint64_t from,
uint64_t len)
{
for (; (from % BITS_PER_WORD) != 0 && len > 0; len--) {
bit_clear(b, from++);
}
if (len >= BITS_PER_WORD) {
memset(&BIT_WORD(b, from), 0, len / 8);
from += len;
len = len % BITS_PER_WORD;
from -= len;
}
for (; len > 0; len--) {
bit_clear(b, from++);
}
}
/** Counts the number of contiguous bits in array ''b'', starting at ''from''
* up to a maximum number of bits ''len''. Returns the number of contiguous
* bits that are the same as the first one specified. If ''run_is_set'' is
* non-NULL, the value of that bit is placed into it.
*/
static inline uint64_t bit_run_count(bitfield_p b, uint64_t from,
uint64_t len, int *run_is_set)
{
uint64_t count = 0;
int first_value = bit_get(b, from);
bitfield_word_t word_match = first_value ? -1 : 0;
if (run_is_set != NULL) {
*run_is_set = first_value;
}
for (; ((from + count) % BITS_PER_WORD) != 0 && len > 0; len--) {
if (bit_has_value(b, from + count, first_value)) {
count++;
} else {
return count;
}
}
for (; len >= BITS_PER_WORD; len -= BITS_PER_WORD) {
if (BIT_WORD(b, from + count) == word_match) {
count += BITS_PER_WORD;
} else {
break;
}
}
for (; len > 0; len--) {
if (bit_has_value(b, from + count, first_value)) {
count++;
}
}
return count;
}
enum bitset_stream_events {
BITSET_STREAM_UNSET = 0,
BITSET_STREAM_SET = 1,
BITSET_STREAM_ON = 2,
BITSET_STREAM_OFF = 3
};
#define BITSET_STREAM_EVENTS_ENUM_SIZE 4
struct bitset_stream_entry {
enum bitset_stream_events event;
uint64_t from;
uint64_t len;
};
/** Limit the stream size to 1MB for now.
*
* If this is too small, it'll cause requests to stall as the migration lags
* behind the changes made by those requests.
*/
#define BITSET_STREAM_SIZE ( ( 1024 * 1024 ) / sizeof( struct bitset_stream_entry ) )
struct bitset_stream {
struct bitset_stream_entry entries[BITSET_STREAM_SIZE];
int in;
int out;
int size;
pthread_mutex_t mutex;
pthread_cond_t cond_not_full;
pthread_cond_t cond_not_empty;
uint64_t queued_bytes[BITSET_STREAM_EVENTS_ENUM_SIZE];
};
/** An application of a bitset - a bitset mapping represents a file of ''size''
* broken down into ''resolution''-sized chunks. The bit set is assumed to
* represent one bit per chunk. We also bundle a lock so that the set can be
* written reliably by multiple threads.
*/
struct bitset {
pthread_mutex_t lock;
uint64_t size;
int resolution;
struct bitset_stream *stream;
int stream_enabled;
bitfield_word_t bits[];
};
/** Allocate a bitset for a file of the given size, and chunks of the
* given resolution.
*/
static inline struct bitset *bitset_alloc(uint64_t size, int resolution)
{
// calculate a size to allocate that is a multiple of the size of the
// bitfield word
size_t bitfield_size =
BIT_WORDS_FOR_SIZE(((size + resolution -
1) / resolution)) * sizeof(bitfield_word_t);
struct bitset *bitset =
xmalloc(sizeof(struct bitset) + (bitfield_size / 8));
bitset->size = size;
bitset->resolution = resolution;
/* don't actually need to call pthread_mutex_destroy ' */
pthread_mutex_init(&bitset->lock, NULL);
bitset->stream = xmalloc(sizeof(struct bitset_stream));
pthread_mutex_init(&bitset->stream->mutex, NULL);
/* Technically don't need to call pthread_cond_destroy either */
pthread_cond_init(&bitset->stream->cond_not_full, NULL);
pthread_cond_init(&bitset->stream->cond_not_empty, NULL);
return bitset;
}
static inline void bitset_free(struct bitset *set)
{
/* TODO: free our mutex... */
free(set->stream);
set->stream = NULL;
free(set);
}
#define INT_FIRST_AND_LAST \
uint64_t first = from/set->resolution, \
last = ((from+len)-1)/set->resolution, \
bitlen = (last-first)+1
#define BITSET_LOCK \
FATAL_IF_NEGATIVE(pthread_mutex_lock(&set->lock), "Error locking bitset")
#define BITSET_UNLOCK \
FATAL_IF_NEGATIVE(pthread_mutex_unlock(&set->lock), "Error unlocking bitset")
static inline void bitset_stream_enqueue(struct bitset *set,
enum bitset_stream_events event,
uint64_t from, uint64_t len)
{
struct bitset_stream *stream = set->stream;
pthread_mutex_lock(&stream->mutex);
while (stream->size == BITSET_STREAM_SIZE) {
pthread_cond_wait(&stream->cond_not_full, &stream->mutex);
}
stream->entries[stream->in].event = event;
stream->entries[stream->in].from = from;
stream->entries[stream->in].len = len;
stream->queued_bytes[event] += len;
stream->size++;
stream->in++;
stream->in %= BITSET_STREAM_SIZE;
pthread_mutex_unlock(&stream->mutex);
pthread_cond_signal(&stream->cond_not_empty);
return;
}
static inline void bitset_stream_dequeue(struct bitset *set,
struct bitset_stream_entry *out)
{
struct bitset_stream *stream = set->stream;
struct bitset_stream_entry *dequeued;
pthread_mutex_lock(&stream->mutex);
while (stream->size == 0) {
pthread_cond_wait(&stream->cond_not_empty, &stream->mutex);
}
dequeued = &stream->entries[stream->out];
if (out != NULL) {
out->event = dequeued->event;
out->from = dequeued->from;
out->len = dequeued->len;
}
stream->queued_bytes[dequeued->event] -= dequeued->len;
stream->size--;
stream->out++;
stream->out %= BITSET_STREAM_SIZE;
pthread_mutex_unlock(&stream->mutex);
pthread_cond_signal(&stream->cond_not_full);
return;
}
static inline size_t bitset_stream_size(struct bitset *set)
{
size_t size;
pthread_mutex_lock(&set->stream->mutex);
size = set->stream->size;
pthread_mutex_unlock(&set->stream->mutex);
return size;
}
static inline uint64_t bitset_stream_queued_bytes(struct bitset *set,
enum bitset_stream_events
event)
{
uint64_t total;
pthread_mutex_lock(&set->stream->mutex);
total = set->stream->queued_bytes[event];
pthread_mutex_unlock(&set->stream->mutex);
return total;
}
static inline void bitset_enable_stream(struct bitset *set)
{
BITSET_LOCK;
set->stream_enabled = 1;
bitset_stream_enqueue(set, BITSET_STREAM_ON, 0, set->size);
BITSET_UNLOCK;
}
static inline void bitset_disable_stream(struct bitset *set)
{
BITSET_LOCK;
bitset_stream_enqueue(set, BITSET_STREAM_OFF, 0, set->size);
set->stream_enabled = 0;
BITSET_UNLOCK;
}
/** Set the bits in a bitset which correspond to the given bytes in the larger
* file.
*/
static inline void bitset_set_range(struct bitset *set,
uint64_t from, uint64_t len)
{
INT_FIRST_AND_LAST;
BITSET_LOCK;
bit_set_range(set->bits, first, bitlen);
if (set->stream_enabled) {
bitset_stream_enqueue(set, BITSET_STREAM_SET, from, len);
}
BITSET_UNLOCK;
}
/** Set every bit in the bitset. */
static inline void bitset_set(struct bitset *set)
{
bitset_set_range(set, 0, set->size);
}
/** Clear the bits in a bitset which correspond to the given bytes in the
* larger file.
*/
static inline void bitset_clear_range(struct bitset *set,
uint64_t from, uint64_t len)
{
INT_FIRST_AND_LAST;
BITSET_LOCK;
bit_clear_range(set->bits, first, bitlen);
if (set->stream_enabled) {
bitset_stream_enqueue(set, BITSET_STREAM_UNSET, from, len);
}
BITSET_UNLOCK;
}
/** Clear every bit in the bitset. */
static inline void bitset_clear(struct bitset *set)
{
bitset_clear_range(set, 0, set->size);
}
/** As per bitset_run_count but also tells you whether the run it found was set
* or unset, atomically.
*/
static inline uint64_t bitset_run_count_ex(struct bitset *set,
uint64_t from,
uint64_t len, int *run_is_set)
{
uint64_t run;
/* Clip our requests to the end of the bitset, avoiding uint underflow. */
if (from > set->size) {
return 0;
}
len = (len + from) > set->size ? (set->size - from) : len;
INT_FIRST_AND_LAST;
BITSET_LOCK;
run =
bit_run_count(set->bits, first, bitlen,
run_is_set) * set->resolution;
run -= (from % set->resolution);
BITSET_UNLOCK;
return run;
}
/** Counts the number of contiguous bytes that are represented as a run in
* the bit field.
*/
static inline uint64_t bitset_run_count(struct bitset *set,
uint64_t from, uint64_t len)
{
return bitset_run_count_ex(set, from, len, NULL);
}
/** Tests whether the bit field is clear for the given file offset.
*/
static inline int bitset_is_clear_at(struct bitset *set, uint64_t at)
{
return bit_is_clear(set->bits, at / set->resolution);
}
/** Tests whether the bit field is set for the given file offset.
*/
static inline int bitset_is_set_at(struct bitset *set, uint64_t at)
{
return bit_is_set(set->bits, at / set->resolution);
}
#endif

726
src/server/client.c Normal file
View File

@@ -0,0 +1,726 @@
#include "client.h"
#include "serve.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "bitset.h"
#include "nbdtypes.h"
#include "self_pipe.h"
#include <sys/mman.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
// When this signal is invoked, we call shutdown() on the client fd, which
// results in the thread being wound up
void client_killswitch_hit(int signal
__attribute__ ((unused)), siginfo_t * info,
void *ptr __attribute__ ((unused)))
{
int fd = info->si_value.sival_int;
warn("Killswitch for fd %i activated, calling shutdown on socket", fd);
FATAL_IF(-1 == shutdown(fd, SHUT_RDWR),
SHOW_ERRNO
("Failed to shutdown() the socket, killing the server")
);
}
struct client *client_create(struct server *serve, int socket)
{
NULLCHECK(serve);
struct client *c;
struct sigevent evp = {
.sigev_notify = SIGEV_SIGNAL,
.sigev_signo = CLIENT_KILLSWITCH_SIGNAL
};
/*
* Our killswitch closes this socket, forcing read() and write() calls
* blocked on it to return with an error. The thread then close()s the
* socket itself, avoiding races.
*/
evp.sigev_value.sival_int = socket;
c = xmalloc(sizeof(struct client));
c->stopped = 0;
c->socket = socket;
c->serve = serve;
c->stop_signal = self_pipe_create();
FATAL_IF_NEGATIVE(timer_create
(CLOCK_MONOTONIC, &evp, &(c->killswitch)),
SHOW_ERRNO("Failed to create killswitch timer")
);
debug("Alloced client %p with socket %d", c, socket);
return c;
}
void client_signal_stop(struct client *c)
{
NULLCHECK(c);
debug("client %p: signal stop (%d, %d)", c, c->stop_signal->read_fd,
c->stop_signal->write_fd);
self_pipe_signal(c->stop_signal);
}
void client_destroy(struct client *client)
{
NULLCHECK(client);
FATAL_IF_NEGATIVE(timer_delete(client->killswitch),
SHOW_ERRNO("Couldn't delete killswitch")
);
debug("Destroying stop signal for client %p", client);
self_pipe_destroy(client->stop_signal);
debug("Freeing client %p", client);
free(client);
}
/**
* So waiting on client->socket is len bytes of data, and we must write it all
* to client->mapped. However while doing do we must consult the bitmap
* client->serve->allocation_map, which is a bitmap where one bit represents
* block_allocation_resolution bytes. Where a bit isn't set, there are no
* disc blocks allocated for that portion of the file, and we'd like to keep
* it that way.
*
* If the bitmap shows that every block in our prospective write is already
* allocated, we can proceed as normal and make one call to writeloop.
*
*/
void write_not_zeroes(struct client *client, uint64_t from, uint64_t len)
{
NULLCHECK(client);
NULLCHECK(client->serve);
NULLCHECK(client->serve->allocation_map);
struct bitset *map = client->serve->allocation_map;
while (len > 0) {
/* so we have to calculate how much of our input to consider
* next based on the bitmap of allocated blocks. This will be
* at a coarser resolution than the actual write, which may
* not fall on a block boundary at either end. So we look up
* how many blocks our write covers, then cut off the start
* and end to get the exact number of bytes.
*/
uint64_t run = bitset_run_count(map, from, len);
debug("write_not_zeroes: from=%ld, len=%d, run=%d", from, len,
run);
if (run > len) {
run = len;
debug("(run adjusted to %d)", run);
}
/*
// Useful but expensive
if (0)
{
uint64_t i;
fprintf(stderr, "full map resolution=%d: ", map->resolution);
for (i=0; i<client->serve->size; i+=map->resolution) {
int here = (from >= i && from < i+map->resolution);
if (here) { fprintf(stderr, ">"); }
fprintf(stderr, bitset_is_set_at(map, i) ? "1" : "0");
if (here) { fprintf(stderr, "<"); }
}
fprintf(stderr, "\n");
}
*/
#define DO_READ(dst, len) ERROR_IF_NEGATIVE( \
readloop( \
client->socket, \
(dst), \
(len) \
), \
SHOW_ERRNO("read failed %ld+%d", from, (len)) \
)
if (bitset_is_set_at(map, from)) {
debug("writing the lot: from=%ld, run=%d", from, run);
/* already allocated, just write it all */
DO_READ(client->mapped + from, run);
/* We know from our earlier call to bitset_run_count that the
* bitset is all-1s at this point, but we need to dirty it for the
* sake of the event stream - the actual bytes have changed, and we
* are interested in that fact.
*/
bitset_set_range(map, from, run);
len -= run;
from += run;
} else {
char zerobuffer[block_allocation_resolution];
/* not allocated, read in block_allocation_resoution */
while (run > 0) {
uint64_t blockrun = block_allocation_resolution -
(from % block_allocation_resolution);
if (blockrun > run)
blockrun = run;
DO_READ(zerobuffer, blockrun);
/* This reads the buffer twice in the worst case
* but we're leaning on memcmp failing early
* and memcpy being fast, rather than try to
* hand-optimized something specific.
*/
int all_zeros = (zerobuffer[0] == 0) &&
(0 ==
memcmp(zerobuffer, zerobuffer + 1, blockrun - 1));
if (!all_zeros) {
memcpy(client->mapped + from, zerobuffer, blockrun);
bitset_set_range(map, from, blockrun);
/* at this point we could choose to
* short-cut the rest of the write for
* faster I/O but by continuing to do it
* the slow way we preserve as much
* sparseness as possible.
*/
}
/* When the block is all_zeroes, no bytes have changed, so we
* don't need to put an event into the bitset stream. This may
* be surprising in the future.
*/
len -= blockrun;
run -= blockrun;
from += blockrun;
}
}
}
}
int fd_read_request(int fd, struct nbd_request_raw *out_request)
{
return readloop(fd, out_request, sizeof(struct nbd_request_raw));
}
/* Returns 1 if *request was filled with a valid request which we should
* try to honour. 0 otherwise. */
int client_read_request(struct client *client,
struct nbd_request *out_request, int *disconnected)
{
NULLCHECK(client);
NULLCHECK(out_request);
struct nbd_request_raw request_raw;
if (fd_read_request(client->socket, &request_raw) == -1) {
*disconnected = 1;
switch (errno) {
case 0:
warn(SHOW_ERRNO("EOF while reading request"));
return 0;
case ECONNRESET:
warn(SHOW_ERRNO("Connection reset while reading request"));
return 0;
case ETIMEDOUT:
warn(SHOW_ERRNO("Connection timed out while reading request"));
return 0;
default:
/* FIXME: I've seen this happen, but I
* couldn't reproduce it so I'm leaving
* it here with a better debug output in
* the hope it'll spontaneously happen
* again. It should *probably* be an
* error() call, but I want to be sure.
* */
fatal(SHOW_ERRNO("Error reading request"));
}
}
nbd_r2h_request(&request_raw, out_request);
return 1;
}
int fd_write_reply(int fd, uint64_t handle, int error)
{
struct nbd_reply reply;
struct nbd_reply_raw reply_raw;
reply.magic = REPLY_MAGIC;
reply.error = error;
reply.handle.w = handle;
nbd_h2r_reply(&reply, &reply_raw);
debug("Replying with handle=0x%08X, error=%" PRIu32, handle, error);
if (-1 == writeloop(fd, &reply_raw, sizeof(reply_raw))) {
switch (errno) {
case ECONNRESET:
error(SHOW_ERRNO("Connection reset while writing reply"));
break;
case EBADF:
fatal(SHOW_ERRNO
("Tried to write to an invalid file descriptor"));
break;
case EPIPE:
error(SHOW_ERRNO("Remote end closed"));
break;
default:
fatal(SHOW_ERRNO("Unhandled error while writing"));
}
}
return 1;
}
/* Writes a reply to request *request, with error, to the client's
* socket.
* Returns 1; we don't check for errors on the write.
* TODO: Check for errors on the write.
*/
int client_write_reply(struct client *client, struct nbd_request *request,
int error)
{
return fd_write_reply(client->socket, request->handle.w, error);
}
void client_write_init(struct client *client, uint64_t size)
{
struct nbd_init init = { {0} };
struct nbd_init_raw init_raw = { {0} };
memcpy(init.passwd, INIT_PASSWD, sizeof(init.passwd));
init.magic = INIT_MAGIC;
init.size = size;
/* As more features are implemented, this is the place to advertise
* them.
*/
init.flags = FLAG_HAS_FLAGS | FLAG_SEND_FLUSH | FLAG_SEND_FUA;
memset(init.reserved, 0, 124);
nbd_h2r_init(&init, &init_raw);
ERROR_IF_NEGATIVE(writeloop
(client->socket, &init_raw, sizeof(init_raw)),
SHOW_ERRNO("Couldn't send hello"));
}
/* Remove len bytes from the client socket. This is needed when the
* client sends a write we can't honour - we need to get rid of the
* bytes they've already written before we can look for another request.
*/
void client_flush(struct client *client, size_t len)
{
int devnull = open("/dev/null", O_WRONLY);
FATAL_IF_NEGATIVE(devnull, SHOW_ERRNO("Couldn't open /dev/null"));
int pipes[2];
pipe(pipes);
const unsigned int flags = SPLICE_F_MORE | SPLICE_F_MOVE;
size_t spliced = 0;
while (spliced < len) {
ssize_t received = splice(client->socket, NULL,
pipes[1], NULL,
len - spliced, flags);
FATAL_IF_NEGATIVE(received, SHOW_ERRNO("splice error"));
ssize_t junked = 0;
while (junked < received) {
ssize_t junk;
junk = splice(pipes[0], NULL, devnull, NULL, received, flags);
FATAL_IF_NEGATIVE(junk, SHOW_ERRNO("splice error"));
junked += junk;
}
spliced += received;
}
debug("Flushed %d bytes", len);
close(devnull);
}
/* Check to see if the client's request needs a reply constructing.
* Returns 1 if we do, 0 otherwise.
* request_err is set to 0 if the client sent a bad request, in which
* case we drop the connection.
*/
int client_request_needs_reply(struct client *client,
struct nbd_request request)
{
/* The client is stupid, but don't take down the whole server as a result.
* We send a reply before disconnecting so that at least some indication of
* the problem is visible, and so proxies don't retry the same (bad) request
* forever.
*/
if (request.magic != REQUEST_MAGIC) {
warn("Bad magic 0x%08X from client", request.magic);
client_write_reply(client, &request, EBADMSG);
client->disconnect = 1; // no need to flush
return 0;
}
debug("request type=%" PRIu16 ", flags=%" PRIu16 ", from=%" PRIu64
", len=%" PRIu32 ", handle=0x%08X", request.type, request.flags,
request.from, request.len, request.handle);
/* check it's not out of range. NBD protocol requires ENOSPC to be
* returned in this instance
*/
if (request.from + request.len > client->serve->size) {
warn("write request %" PRIu64 "+%" PRIu32 " out of range",
request.from, request.len);
if (request.type == REQUEST_WRITE) {
client_flush(client, request.len);
}
client_write_reply(client, &request, ENOSPC);
client->disconnect = 0;
return 0;
}
switch (request.type) {
case REQUEST_READ:
break;
case REQUEST_WRITE:
break;
case REQUEST_DISCONNECT:
debug("request disconnect");
client->disconnect = 1;
return 0;
case REQUEST_FLUSH:
break;
default:
/* NBD prototcol says servers SHOULD return EINVAL to unknown
* commands */
warn("Unknown request 0x%08X", request.type);
client_write_reply(client, &request, EINVAL);
client->disconnect = 0;
return 0;
}
return 1;
}
void client_reply_to_read(struct client *client,
struct nbd_request request)
{
off64_t offset;
debug("request read %ld+%d", request.from, request.len);
sock_set_tcp_cork(client->socket, 1);
client_write_reply(client, &request, 0);
offset = request.from;
/* If we get cut off partway through this sendfile, we don't
* want to kill the server. This should be an error.
*/
ERROR_IF_NEGATIVE(sendfileloop(client->socket,
client->fileno,
&offset,
request.len),
"sendfile failed from=%ld, len=%d",
offset, request.len);
sock_set_tcp_cork(client->socket, 0);
}
void client_reply_to_write(struct client *client,
struct nbd_request request)
{
debug("request write from=%" PRIu64 ", len=%" PRIu32 ", handle=0x%08X",
request.from, request.len, request.handle);
if (client->serve->allocation_map_built) {
write_not_zeroes(client, request.from, request.len);
} else {
debug("No allocation map, writing directly.");
/* If we get cut off partway through reading this data:
* */
ERROR_IF_NEGATIVE(readloop(client->socket,
client->mapped + request.from,
request.len),
SHOW_ERRNO("reading write data failed from=%ld, len=%d",
request.from, request.len));
/* the allocation_map is shared between client threads, and may be
* being built. We need to reflect the write in it, as it may be in
* a position the builder has already gone over.
*/
bitset_set_range(client->serve->allocation_map, request.from,
request.len);
}
// Only flush if FUA is set -- overridden for now to force flush after each
// write.
// if (request.flags & CMD_FLAG_FUA) {
if (1) {
/* multiple of page size */
uint64_t from_rounded =
request.from & (~(sysconf(_SC_PAGE_SIZE) - 1));
uint64_t len_rounded = request.len + (request.from - from_rounded);
debug("Calling msync from=%" PRIu64 ", len=%" PRIu64 "",
from_rounded, len_rounded);
FATAL_IF_NEGATIVE(msync(client->mapped + from_rounded,
len_rounded,
MS_SYNC | MS_INVALIDATE),
"msync failed %ld %ld", request.from,
request.len);
}
client_write_reply(client, &request, 0);
}
void client_reply_to_flush(struct client *client,
struct nbd_request request)
{
debug("request flush from=%" PRIu64 ", len=%" PRIu32 ", handle=0x%08X",
request.from, request.len, request.handle);
ERROR_IF_NEGATIVE(msync
(client->mapped, client->mapped_size,
MS_SYNC | MS_INVALIDATE), "flush failed");
client_write_reply(client, &request, 0);
}
void client_reply(struct client *client, struct nbd_request request)
{
switch (request.type) {
case REQUEST_READ:
client_reply_to_read(client, request);
break;
case REQUEST_WRITE:
client_reply_to_write(client, request);
break;
case REQUEST_FLUSH:
client_reply_to_flush(client, request);
break;
}
}
/* Starts a timer that will kill the whole process if disarm is not called
* within a timeout (see CLIENT_HANDLE_TIMEOUT).
*/
void client_arm_killswitch(struct client *client)
{
struct itimerspec its = {
.it_value = {.tv_nsec = 0,.tv_sec = CLIENT_HANDLER_TIMEOUT},
.it_interval = {.tv_nsec = 0,.tv_sec = 0}
};
if (!client->serve->use_killswitch) {
return;
}
debug("Arming killswitch");
FATAL_IF_NEGATIVE(timer_settime(client->killswitch, 0, &its, NULL),
SHOW_ERRNO("Failed to arm killswitch")
);
return;
}
void client_disarm_killswitch(struct client *client)
{
struct itimerspec its = {
.it_value = {.tv_nsec = 0,.tv_sec = 0},
.it_interval = {.tv_nsec = 0,.tv_sec = 0}
};
if (!client->serve->use_killswitch) {
return;
}
debug("Disarming killswitch");
FATAL_IF_NEGATIVE(timer_settime(client->killswitch, 0, &its, NULL),
SHOW_ERRNO("Failed to disarm killswitch")
);
return;
}
/* Returns 0 if we should continue trying to serve requests */
int client_serve_request(struct client *client)
{
struct nbd_request request = { 0 };
int stop = 1;
int disconnected = 0;
fd_set rfds, efds;
int fd_count;
/* wait until there are some bytes on the fd before committing to reads
* FIXME: this whole scheme is broken because we're using blocking reads.
* read() can block directly after a select anyway, and it's possible that,
* without the killswitch, we'd hang forever. With the killswitch, we just
* hang for "a while". The Right Thing to do is to rewrite client.c to be
* non-blocking.
*/
FD_ZERO(&rfds);
FD_SET(client->socket, &rfds);
self_pipe_fd_set(client->stop_signal, &rfds);
FD_ZERO(&efds);
FD_SET(client->socket, &efds);
fd_count = sock_try_select(FD_SETSIZE, &rfds, NULL, &efds, NULL);
if (fd_count == 0) {
/* This "can't ever happen" */
fatal("No FDs selected, and no timeout!");
} else if (fd_count < 0) {
fatal("Select failed");
}
if (self_pipe_fd_isset(client->stop_signal, &rfds)) {
debug("Client received stop signal.");
return 1; // Don't try to serve more requests
}
if (FD_ISSET(client->socket, &efds)) {
debug("Client connection closed");
return 1;
}
/* We arm / disarm around the whole request cycle. The reason for this is
* that the remote peer could uncleanly die at any point; if we're stuck on
* a blocking read(), then that will hang for (almost) forever. This is bad
* in general, makes the server respond only to kill -9, and breaks
* outward mirroring in a most unpleasant way.
*
* Don't forget to disarm before exiting, no matter what!
*
* The replication is simple: open a connection to the flexnbd server, write
* a single byte, and then wait.
*
*/
client_arm_killswitch(client);
if (!client_read_request(client, &request, &disconnected)) {
client_disarm_killswitch(client);
return stop;
}
if (disconnected) {
client_disarm_killswitch(client);
return stop;
}
if (!client_request_needs_reply(client, request)) {
client_disarm_killswitch(client);
return client->disconnect;
}
{
if (!server_is_closed(client->serve)) {
client_reply(client, request);
stop = 0;
}
}
client_disarm_killswitch(client);
return stop;
}
void client_send_hello(struct client *client)
{
client_write_init(client, client->serve->size);
}
void client_cleanup(struct client *client,
int fatal __attribute__ ((unused)))
{
info("client cleanup for client %p", client);
/* If the thread hits an error, we need to ensure this is off */
client_disarm_killswitch(client);
if (client->socket) {
FATAL_IF_NEGATIVE(close(client->socket),
"Error closing client socket %d",
client->socket);
debug("Closed client socket fd %d", client->socket);
client->socket = -1;
}
if (client->mapped) {
munmap(client->mapped, client->serve->size);
}
if (client->fileno) {
FATAL_IF_NEGATIVE(close(client->fileno),
"Error closing file %d", client->fileno);
debug("Closed client file fd %d", client->fileno);
client->fileno = -1;
}
if (server_acl_locked(client->serve)) {
server_unlock_acl(client->serve);
}
}
void *client_serve(void *client_uncast)
{
struct client *client = (struct client *) client_uncast;
error_set_handler((cleanup_handler *) client_cleanup, client);
info("client: mmaping file");
FATAL_IF_NEGATIVE(open_and_mmap(client->serve->filename,
&client->fileno,
&client->mapped_size,
(void **) &client->mapped),
SHOW_ERRNO("Couldn't open/mmap file %s",
client->serve->filename)
);
FATAL_IF_NEGATIVE(madvise
(client->mapped, client->serve->size, MADV_RANDOM),
SHOW_ERRNO("Failed to madvise() %s",
client->serve->filename)
);
debug("Opened client file fd %d", client->fileno);
debug("client: sending hello");
client_send_hello(client);
debug("client: serving requests");
while (client_serve_request(client) == 0);
debug("client: stopped serving requests");
client->stopped = 1;
if (client->disconnect) {
debug("client: control arrived");
server_control_arrived(client->serve);
}
debug("Cleaning client %p up normally in thread %p", client,
pthread_self());
client_cleanup(client, 0);
debug("Client thread done");
return NULL;
}

58
src/server/client.h Normal file
View File

@@ -0,0 +1,58 @@
#ifndef CLIENT_H
#define CLIENT_H
#include <signal.h>
#include <time.h>
#include <inttypes.h>
/** CLIENT_HANDLER_TIMEOUT
* This is the length of time (in seconds) any request can be outstanding for.
* If we spend longer than this in a request, the whole server is killed.
*/
#define CLIENT_HANDLER_TIMEOUT 120
/** CLIENT_KILLSWITCH_SIGNAL
* The signal number we use to kill the server when *any* killswitch timer
* fires. The handler gets the fd of the client socket to work with.
*/
#define CLIENT_KILLSWITCH_SIGNAL ( SIGRTMIN + 1 )
struct client {
/* When we call pthread_join, if the thread is already dead
* we can get an ESRCH. Since we have no other way to tell
* if that ESRCH is from a dead thread or a thread that never
* existed, we use a `stopped` flag to indicate a thread which
* did exist, but went away. Only check this after a
* pthread_join call.
*/
int stopped;
int socket;
int fileno;
char *mapped;
uint64_t mapped_size;
struct self_pipe *stop_signal;
struct server *serve; /* FIXME: remove above duplication */
/* Have we seen a REQUEST_DISCONNECT message? */
int disconnect;
/* kill the whole server if a request has been outstanding too long,
* assuming use_killswitch is set in serve
*/
timer_t killswitch;
};
void client_killswitch_hit(int signal, siginfo_t * info, void *ptr);
void *client_serve(void *client_uncast);
struct client *client_create(struct server *serve, int socket);
void client_destroy(struct client *client);
void client_signal_stop(struct client *client);
#endif

613
src/server/control.c Normal file
View File

@@ -0,0 +1,613 @@
/* FlexNBD server (C) Bytemark Hosting 2012
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/** The control server responds on a UNIX socket and services our "remote"
* commands which are used for changing the access control list, initiating
* a mirror process, or asking for status. The protocol is pretty simple -
* after connecting the client sends a series of LF-terminated lines, followed
* by a blank line (i.e. double LF). The first line is taken to be the command
* name to invoke, and the lines before the double LF are its arguments.
*
* These commands can be invoked remotely from the command line, with the
* client code to be found in remote.c
*/
#include "control.h"
#include "mirror.h"
#include "serve.h"
#include "util.h"
#include "ioutil.h"
#include "parse.h"
#include "readwrite.h"
#include "bitset.h"
#include "self_pipe.h"
#include "acl.h"
#include "status.h"
#include "mbox.h"
#include <stdlib.h>
#include <string.h>
#include <sys/un.h>
#include <unistd.h>
struct control *control_create(struct flexnbd *flexnbd, const char *csn)
{
struct control *control = xmalloc(sizeof(struct control));
NULLCHECK(csn);
control->flexnbd = flexnbd;
control->socket_name = csn;
control->open_signal = self_pipe_create();
control->close_signal = self_pipe_create();
control->mirror_state_mbox = mbox_create();
return control;
}
void control_signal_close(struct control *control)
{
NULLCHECK(control);
self_pipe_signal(control->close_signal);
}
void control_destroy(struct control *control)
{
NULLCHECK(control);
mbox_destroy(control->mirror_state_mbox);
self_pipe_destroy(control->close_signal);
self_pipe_destroy(control->open_signal);
free(control);
}
struct control_client *control_client_create(struct flexnbd *flexnbd,
int client_fd,
struct mbox *state_mbox)
{
NULLCHECK(flexnbd);
struct control_client *control_client =
xmalloc(sizeof(struct control_client));
control_client->socket = client_fd;
control_client->flexnbd = flexnbd;
control_client->mirror_state_mbox = state_mbox;
return control_client;
}
void control_client_destroy(struct control_client *client)
{
NULLCHECK(client);
free(client);
}
void control_respond(struct control_client *client);
void control_handle_client(struct control *control, int client_fd)
{
NULLCHECK(control);
NULLCHECK(control->flexnbd);
struct control_client *control_client =
control_client_create(control->flexnbd,
client_fd,
control->mirror_state_mbox);
/* We intentionally don't spawn a thread for the client here.
* This is to avoid having more than one thread potentially
* waiting on the migration commit status.
*/
control_respond(control_client);
}
void control_accept_client(struct control *control)
{
int client_fd;
union mysockaddr client_address;
socklen_t addrlen = sizeof(union mysockaddr);
client_fd =
accept(control->control_fd, &client_address.generic, &addrlen);
FATAL_IF(-1 == client_fd, "control accept failed");
control_handle_client(control, client_fd);
}
int control_accept(struct control *control)
{
NULLCHECK(control);
fd_set fds;
FD_ZERO(&fds);
FD_SET(control->control_fd, &fds);
self_pipe_fd_set(control->close_signal, &fds);
debug("Control thread selecting");
FATAL_UNLESS(0 < select(FD_SETSIZE, &fds, NULL, NULL, NULL),
"Control select failed.");
if (self_pipe_fd_isset(control->close_signal, &fds)) {
return 0;
}
if (FD_ISSET(control->control_fd, &fds)) {
control_accept_client(control);
}
return 1;
}
void control_accept_loop(struct control *control)
{
while (control_accept(control));
}
int open_control_socket(const char *socket_name)
{
struct sockaddr_un bind_address;
int control_fd;
if (!socket_name) {
fatal("Tried to open a control socket without a socket name");
}
control_fd = socket(AF_UNIX, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(control_fd, "Couldn't create control socket");
memset(&bind_address, 0, sizeof(struct sockaddr_un));
bind_address.sun_family = AF_UNIX;
strncpy(bind_address.sun_path, socket_name,
sizeof(bind_address.sun_path) - 1);
//unlink(socket_name); /* ignore failure */
FATAL_IF_NEGATIVE(bind
(control_fd, &bind_address, sizeof(bind_address)),
"Couldn't bind control socket to %s: %s",
socket_name, strerror(errno)
);
FATAL_IF_NEGATIVE(listen(control_fd, 5),
"Couldn't listen on control socket");
return control_fd;
}
void control_listen(struct control *control)
{
NULLCHECK(control);
control->control_fd = open_control_socket(control->socket_name);
}
void control_wait_for_open_signal(struct control *control)
{
fd_set fds;
FD_ZERO(&fds);
self_pipe_fd_set(control->open_signal, &fds);
FATAL_IF_NEGATIVE(select(FD_SETSIZE, &fds, NULL, NULL, NULL),
"select() failed");
self_pipe_signal_clear(control->open_signal);
}
void control_serve(struct control *control)
{
NULLCHECK(control);
control_wait_for_open_signal(control);
control_listen(control);
while (control_accept(control));
}
void control_cleanup(struct control *control,
int fatal __attribute__ ((unused)))
{
NULLCHECK(control);
unlink(control->socket_name);
close(control->control_fd);
}
void *control_runner(void *control_uncast)
{
debug("Control thread");
NULLCHECK(control_uncast);
struct control *control = (struct control *) control_uncast;
error_set_handler((cleanup_handler *) control_cleanup, control);
control_serve(control);
control_cleanup(control, 0);
pthread_exit(NULL);
}
#define write_socket(msg) write(client_fd, (msg "\n"), strlen((msg))+1)
void control_write_mirror_response(enum mirror_state mirror_state,
int client_fd)
{
switch (mirror_state) {
case MS_INIT:
case MS_UNKNOWN:
write_socket("1: Mirror failed to initialise");
fatal("Impossible mirror state: %d", mirror_state);
case MS_FAIL_CONNECT:
write_socket("1: Mirror failed to connect");
break;
case MS_FAIL_REJECTED:
write_socket("1: Mirror was rejected");
break;
case MS_FAIL_NO_HELLO:
write_socket("1: Remote server failed to respond");
break;
case MS_FAIL_SIZE_MISMATCH:
write_socket("1: Remote size does not match local size");
break;
case MS_ABANDONED:
write_socket("1: Mirroring abandoned");
break;
case MS_GO:
case MS_DONE: /* Yes, I know we know better, but it's simpler this way */
write_socket("0: Mirror started");
break;
default:
fatal("Unhandled mirror state: %d", mirror_state);
}
}
#undef write_socket
/* Call this in the thread where you want to receive the mirror state */
enum mirror_state control_client_mirror_wait(struct control_client *client)
{
NULLCHECK(client);
NULLCHECK(client->mirror_state_mbox);
struct mbox *mbox = client->mirror_state_mbox;
enum mirror_state mirror_state;
enum mirror_state *contents;
contents = (enum mirror_state *) mbox_receive(mbox);
NULLCHECK(contents);
mirror_state = *contents;
free(contents);
return mirror_state;
}
#define write_socket(msg) write(client->socket, (msg "\n"), strlen((msg))+1)
/** Command parser to start mirror process from socket input */
int control_mirror(struct control_client *client, int linesc, char **lines)
{
NULLCHECK(client);
struct flexnbd *flexnbd = client->flexnbd;
union mysockaddr *connect_to = xmalloc(sizeof(union mysockaddr));
union mysockaddr *connect_from = NULL;
uint64_t max_Bps = UINT64_MAX;
int action_at_finish;
int raw_port;
if (linesc < 2) {
write_socket("1: mirror takes at least two parameters");
return -1;
}
if (parse_ip_to_sockaddr(&connect_to->generic, lines[0]) == 0) {
write_socket("1: bad IP address");
return -1;
}
raw_port = atoi(lines[1]);
if (raw_port < 0 || raw_port > 65535) {
write_socket("1: bad IP port number");
return -1;
}
connect_to->v4.sin_port = htobe16(raw_port);
action_at_finish = ACTION_EXIT;
if (linesc > 2) {
if (strcmp("exit", lines[2]) == 0) {
action_at_finish = ACTION_EXIT;
} else if (strcmp("unlink", lines[2]) == 0) {
action_at_finish = ACTION_UNLINK;
} else if (strcmp("nothing", lines[2]) == 0) {
action_at_finish = ACTION_NOTHING;
} else {
write_socket("1: action must be 'exit' or 'nothing'");
return -1;
}
}
if (linesc > 3) {
connect_from = xmalloc(sizeof(union mysockaddr));
if (parse_ip_to_sockaddr(&connect_from->generic, lines[3]) == 0) {
write_socket("1: bad bind address");
return -1;
}
}
if (linesc > 4) {
errno = 0;
max_Bps = strtoull(lines[4], NULL, 10);
if (errno == ERANGE) {
write_socket("1: max_bps out of range");
return -1;
} else if (errno != 0) {
write_socket("1: max_bps couldn't be parsed");
return -1;
}
}
if (linesc > 5) {
write_socket("1: unrecognised parameters to mirror");
return -1;
}
struct server *serve = flexnbd_server(flexnbd);
server_lock_start_mirror(serve);
{
if (server_mirror_can_start(serve)) {
serve->mirror_super = mirror_super_create(serve->filename,
connect_to,
connect_from,
max_Bps,
action_at_finish,
client->
mirror_state_mbox);
serve->mirror = serve->mirror_super->mirror;
server_prevent_mirror_start(serve);
} else {
if (serve->mirror_super) {
warn("Tried to start a second mirror run");
write_socket("1: mirror already running");
} else {
warn("Cannot start mirroring, shutting down");
write_socket("1: shutting down");
}
}
}
server_unlock_start_mirror(serve);
/* Do this outside the lock to minimise the length of time the
* sighandler can block the serve thread
*/
if (serve->mirror_super) {
FATAL_IF(0 != pthread_create(&serve->mirror_super->thread,
NULL,
mirror_super_runner,
serve),
"Failed to create mirror thread");
debug("Control thread mirror super waiting");
enum mirror_state state = control_client_mirror_wait(client);
debug("Control thread writing response");
control_write_mirror_response(state, client->socket);
}
debug("Control thread going away.");
return 0;
}
int control_mirror_max_bps(struct control_client *client, int linesc,
char **lines)
{
NULLCHECK(client);
NULLCHECK(client->flexnbd);
struct server *serve = flexnbd_server(client->flexnbd);
uint64_t max_Bps;
if (!serve->mirror_super) {
write_socket("1: Not currently mirroring");
return -1;
}
if (linesc != 1) {
write_socket("1: Bad format");
return -1;
}
errno = 0;
max_Bps = strtoull(lines[0], NULL, 10);
if (errno == ERANGE) {
write_socket("1: max_bps out of range");
return -1;
} else if (errno != 0) {
write_socket("1: max_bps couldn't be parsed");
return -1;
}
serve->mirror->max_bytes_per_second = max_Bps;
write_socket("0: updated");
return 0;
}
#undef write_socket
/** Command parser to alter access control list from socket input */
int control_acl(struct control_client *client, int linesc, char **lines)
{
NULLCHECK(client);
NULLCHECK(client->flexnbd);
struct flexnbd *flexnbd = client->flexnbd;
int default_deny = flexnbd_default_deny(flexnbd);
struct acl *new_acl = acl_create(linesc, lines, default_deny);
if (new_acl->len != linesc) {
warn("Bad ACL spec: %s", lines[new_acl->len]);
write(client->socket, "1: bad spec: ", 13);
write(client->socket, lines[new_acl->len],
strlen(lines[new_acl->len]));
write(client->socket, "\n", 1);
acl_destroy(new_acl);
} else {
flexnbd_replace_acl(flexnbd, new_acl);
info("ACL set");
write(client->socket, "0: updated\n", 11);
}
return 0;
}
int control_break(struct control_client *client,
int linesc __attribute__ ((unused)),
char **lines __attribute__ ((unused))
)
{
NULLCHECK(client);
NULLCHECK(client->flexnbd);
int result = 0;
struct flexnbd *flexnbd = client->flexnbd;
struct server *serve = flexnbd_server(flexnbd);
server_lock_start_mirror(serve);
{
if (server_is_mirroring(serve)) {
info("Signaling to abandon mirror");
server_abandon_mirror(serve);
debug("Abandon signaled");
if (server_is_closed(serve)) {
info("Mirror completed while canceling");
write(client->socket, "1: mirror completed\n", 20);
} else {
info("Mirror successfully stopped.");
write(client->socket, "0: mirror stopped\n", 18);
result = 1;
}
} else {
warn("Not mirroring.");
write(client->socket, "1: not mirroring\n", 17);
}
}
server_unlock_start_mirror(serve);
return result;
}
/** FIXME: add some useful statistics */
int control_status(struct control_client *client,
int linesc __attribute__ ((unused)),
char **lines __attribute__ ((unused))
)
{
NULLCHECK(client);
NULLCHECK(client->flexnbd);
struct status *status = flexnbd_status_create(client->flexnbd);
write(client->socket, "0: ", 3);
status_write(status, client->socket);
status_destroy(status);
return 0;
}
void control_client_cleanup(struct control_client *client,
int fatal __attribute__ ((unused)))
{
if (client->socket) {
close(client->socket);
}
/* This is wrongness */
if (server_acl_locked(client->flexnbd->serve)) {
server_unlock_acl(client->flexnbd->serve);
}
control_client_destroy(client);
}
/** Master command parser for control socket connections, delegates quickly */
void control_respond(struct control_client *client)
{
char **lines = NULL;
error_set_handler((cleanup_handler *) control_client_cleanup, client);
int i, linesc;
linesc = read_lines_until_blankline(client->socket, 256, &lines);
if (linesc < 1) {
write(client->socket, "9: missing command\n", 19);
/* ignore failure */
} else if (strcmp(lines[0], "acl") == 0) {
info("acl command received");
if (control_acl(client, linesc - 1, lines + 1) < 0) {
debug("acl command failed");
}
} else if (strcmp(lines[0], "mirror") == 0) {
info("mirror command received");
if (control_mirror(client, linesc - 1, lines + 1) < 0) {
debug("mirror command failed");
}
} else if (strcmp(lines[0], "break") == 0) {
info("break command received");
if (control_break(client, linesc - 1, lines + 1) < 0) {
debug("break command failed");
}
} else if (strcmp(lines[0], "status") == 0) {
info("status command received");
if (control_status(client, linesc - 1, lines + 1) < 0) {
debug("status command failed");
}
} else if (strcmp(lines[0], "mirror_max_bps") == 0) {
info("mirror_max_bps command received");
if (control_mirror_max_bps(client, linesc - 1, lines + 1) < 0) {
debug("mirror_max_bps command failed");
}
} else {
write(client->socket, "10: unknown command\n", 23);
}
for (i = 0; i < linesc; i++) {
free(lines[i]);
}
free(lines);
control_client_cleanup(client, 0);
debug("control command handled");
}

58
src/server/control.h Normal file
View File

@@ -0,0 +1,58 @@
#ifndef CONTROL_H
#define CONTROL_H
/* We need this to avoid a complaint about struct server * in
* void accept_control_connection
*/
struct server;
#include "parse.h"
#include "mirror.h"
#include "serve.h"
#include "flexnbd.h"
#include "mbox.h"
struct control {
struct flexnbd *flexnbd;
int control_fd;
const char *socket_name;
pthread_t thread;
struct self_pipe *open_signal;
struct self_pipe *close_signal;
/* This is owned by the control object, and used by a
* mirror_super to communicate the state of a mirror attempt as
* early as feasible. It can't be owned by the mirror_super
* object because the mirror_super object can be freed at any
* time (including while the control_client is waiting on it),
* whereas the control object lasts for the lifetime of the
* process (and we can only have a mirror thread if the control
* thread has started it).
*/
struct mbox *mirror_state_mbox;
};
struct control_client {
int socket;
struct flexnbd *flexnbd;
/* Passed in on creation. We know it's all right to do this
* because we know there's only ever one control_client.
*/
struct mbox *mirror_state_mbox;
};
struct control *control_create(struct flexnbd *,
const char *control_socket_name);
void control_signal_close(struct control *);
void control_destroy(struct control *);
void *control_runner(void *);
void accept_control_connection(struct server *params, int client_fd,
union mysockaddr *client_address);
void serve_open_control_socket(struct server *params);
#endif

248
src/server/flexnbd.c Normal file
View File

@@ -0,0 +1,248 @@
/* FlexNBD server (C) Bytemark Hosting 2012
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
/** main() function for parsing and dispatching commands. Each mode has
* a corresponding structure which is filled in and passed to a do_ function
* elsewhere in the program.
*/
#include "flexnbd.h"
#include "serve.h"
#include "util.h"
#include "control.h"
#include "status.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/signalfd.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <getopt.h>
#include "acl.h"
int flexnbd_build_signal_fd(void)
{
sigset_t mask;
int sfd;
sigemptyset(&mask);
sigaddset(&mask, SIGTERM);
sigaddset(&mask, SIGQUIT);
sigaddset(&mask, SIGINT);
FATAL_UNLESS(0 == pthread_sigmask(SIG_BLOCK, &mask, NULL),
"Signal blocking failed");
sfd = signalfd(-1, &mask, 0);
FATAL_IF(-1 == sfd, "Failed to get a signal fd");
return sfd;
}
void flexnbd_create_shared(struct flexnbd *flexnbd,
const char *s_ctrl_sock)
{
NULLCHECK(flexnbd);
if (s_ctrl_sock) {
flexnbd->control = control_create(flexnbd, s_ctrl_sock);
} else {
flexnbd->control = NULL;
}
flexnbd->signal_fd = flexnbd_build_signal_fd();
}
struct flexnbd *flexnbd_create_serving(char *s_ip_address,
char *s_port,
char *s_file,
char *s_ctrl_sock,
int default_deny,
int acl_entries,
char **s_acl_entries,
int max_nbd_clients,
int use_killswitch)
{
struct flexnbd *flexnbd = xmalloc(sizeof(struct flexnbd));
flexnbd->serve = server_create(flexnbd,
s_ip_address,
s_port,
s_file,
default_deny,
acl_entries,
s_acl_entries,
max_nbd_clients, use_killswitch, 1);
flexnbd_create_shared(flexnbd, s_ctrl_sock);
// Beats installing one handler per client instance
if (use_killswitch) {
struct sigaction act = {
.sa_sigaction = client_killswitch_hit,
.sa_flags = SA_RESTART | SA_SIGINFO
};
FATAL_UNLESS(0 == sigaction(CLIENT_KILLSWITCH_SIGNAL, &act, NULL),
"Installing client killswitch signal failed");
}
return flexnbd;
}
struct flexnbd *flexnbd_create_listening(char *s_ip_address,
char *s_port,
char *s_file,
char *s_ctrl_sock,
int default_deny,
int acl_entries,
char **s_acl_entries)
{
struct flexnbd *flexnbd = xmalloc(sizeof(struct flexnbd));
flexnbd->serve = server_create(flexnbd,
s_ip_address,
s_port,
s_file,
default_deny,
acl_entries, s_acl_entries, 1, 0, 0);
flexnbd_create_shared(flexnbd, s_ctrl_sock);
// listen can't use killswitch, as mirror may pause on sending things
// for a very long time.
return flexnbd;
}
void flexnbd_spawn_control(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
NULLCHECK(flexnbd->control);
pthread_t *control_thread = &flexnbd->control->thread;
FATAL_UNLESS(0 == pthread_create(control_thread,
NULL,
control_runner,
flexnbd->control),
"Couldn't create the control thread");
}
void flexnbd_stop_control(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
NULLCHECK(flexnbd->control);
control_signal_close(flexnbd->control);
pthread_t tid = flexnbd->control->thread;
FATAL_UNLESS(0 == pthread_join(tid, NULL),
"Failed joining the control thread");
debug("Control thread %p pthread_join returned", tid);
}
int flexnbd_signal_fd(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
return flexnbd->signal_fd;
}
void flexnbd_destroy(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
if (flexnbd->control) {
control_destroy(flexnbd->control);
}
close(flexnbd->signal_fd);
free(flexnbd);
}
struct server *flexnbd_server(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
return flexnbd->serve;
}
void flexnbd_replace_acl(struct flexnbd *flexnbd, struct acl *acl)
{
NULLCHECK(flexnbd);
server_replace_acl(flexnbd_server(flexnbd), acl);
}
struct status *flexnbd_status_create(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
struct status *status;
status = status_create(flexnbd_server(flexnbd));
return status;
}
void flexnbd_set_server(struct flexnbd *flexnbd, struct server *serve)
{
NULLCHECK(flexnbd);
flexnbd->serve = serve;
}
/* Get the default_deny of the current server object. */
int flexnbd_default_deny(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
return server_default_deny(flexnbd->serve);
}
void make_writable(const char *filename)
{
NULLCHECK(filename);
FATAL_IF_NEGATIVE(chmod(filename, S_IWUSR),
"Couldn't chmod %s: %s", filename, strerror(errno));
}
int flexnbd_serve(struct flexnbd *flexnbd)
{
NULLCHECK(flexnbd);
int success;
struct self_pipe *open_signal = NULL;
if (flexnbd->control) {
debug("Spawning control thread");
flexnbd_spawn_control(flexnbd);
open_signal = flexnbd->control->open_signal;
}
success = do_serve(flexnbd->serve, open_signal);
debug("do_serve success is %d", success);
if (flexnbd->control) {
debug("Stopping control thread");
flexnbd_stop_control(flexnbd);
debug("Control thread stopped");
}
return success;
}

63
src/server/flexnbd.h Normal file
View File

@@ -0,0 +1,63 @@
#ifndef FLEXNBD_H
#define FLEXNBD_H
#include "acl.h"
#include "mirror.h"
#include "serve.h"
#include "proxy.h"
#include "client.h"
#include "self_pipe.h"
#include "mbox.h"
#include "control.h"
#include "flexthread.h"
/* Carries the "globals". */
struct flexnbd {
/* Our serve pointer should never be dereferenced outside a
* flexnbd_switch_lock/unlock pair.
*/
struct server *serve;
/* We only have a control object if a control socket name was
* passed on the command line.
*/
struct control *control;
/* File descriptor for a signalfd(2) signal stream. */
int signal_fd;
};
struct flexnbd *flexnbd_create(void);
struct flexnbd *flexnbd_create_serving(char *s_ip_address,
char *s_port,
char *s_file,
char *s_ctrl_sock,
int default_deny,
int acl_entries,
char **s_acl_entries,
int max_nbd_clients,
int use_killswitch);
struct flexnbd *flexnbd_create_listening(char *s_ip_address,
char *s_port,
char *s_file,
char *s_ctrl_sock,
int default_deny,
int acl_entries,
char **s_acl_entries);
void flexnbd_destroy(struct flexnbd *);
enum mirror_state;
enum mirror_state flexnbd_get_mirror_state(struct flexnbd *);
int flexnbd_default_deny(struct flexnbd *);
void flexnbd_set_server(struct flexnbd *flexnbd, struct server *serve);
int flexnbd_signal_fd(struct flexnbd *flexnbd);
int flexnbd_serve(struct flexnbd *flexnbd);
int flexnbd_proxy(struct flexnbd *flexnbd);
struct server *flexnbd_server(struct flexnbd *flexnbd);
void flexnbd_replace_acl(struct flexnbd *flexnbd, struct acl *acl);
struct status *flexnbd_status_create(struct flexnbd *flexnbd);
#endif

73
src/server/flexthread.c Normal file
View File

@@ -0,0 +1,73 @@
#include "flexthread.h"
#include "util.h"
#include <pthread.h>
struct flexthread_mutex *flexthread_mutex_create(void)
{
struct flexthread_mutex *ftm =
xmalloc(sizeof(struct flexthread_mutex));
FATAL_UNLESS(0 == pthread_mutex_init(&ftm->mutex, NULL),
"Mutex initialisation failed");
return ftm;
}
void flexthread_mutex_destroy(struct flexthread_mutex *ftm)
{
NULLCHECK(ftm);
if (flexthread_mutex_held(ftm)) {
flexthread_mutex_unlock(ftm);
} else if ((pthread_t) NULL != ftm->holder) {
/* This "should never happen": if we can try to destroy
* a mutex currently held by another thread, there's a
* logic bug somewhere. I know the test here is racy,
* but there's not a lot we can do about it at this
* point.
*/
fatal("Attempted to destroy a flexthread_mutex"
" held by another thread!");
}
FATAL_UNLESS(0 == pthread_mutex_destroy(&ftm->mutex),
"Mutex destroy failed");
free(ftm);
}
int flexthread_mutex_lock(struct flexthread_mutex *ftm)
{
NULLCHECK(ftm);
int failure = pthread_mutex_lock(&ftm->mutex);
if (0 == failure) {
ftm->holder = pthread_self();
}
return failure;
}
int flexthread_mutex_unlock(struct flexthread_mutex *ftm)
{
NULLCHECK(ftm);
pthread_t orig = ftm->holder;
ftm->holder = (pthread_t) NULL;
int failure = pthread_mutex_unlock(&ftm->mutex);
if (0 != failure) {
ftm->holder = orig;
}
return failure;
}
int flexthread_mutex_held(struct flexthread_mutex *ftm)
{
NULLCHECK(ftm);
return pthread_self() == ftm->holder;
}

View File

@@ -15,15 +15,15 @@
*/
struct flexthread_mutex {
pthread_mutex_t mutex;
pthread_t holder;
pthread_mutex_t mutex;
pthread_t holder;
};
struct flexthread_mutex * flexthread_mutex_create(void);
void flexthread_mutex_destroy( struct flexthread_mutex * );
struct flexthread_mutex *flexthread_mutex_create(void);
void flexthread_mutex_destroy(struct flexthread_mutex *);
int flexthread_mutex_lock( struct flexthread_mutex * );
int flexthread_mutex_unlock( struct flexthread_mutex * );
int flexthread_mutex_held( struct flexthread_mutex * );
int flexthread_mutex_lock(struct flexthread_mutex *);
int flexthread_mutex_unlock(struct flexthread_mutex *);
int flexthread_mutex_held(struct flexthread_mutex *);
#endif

77
src/server/mbox.c Normal file
View File

@@ -0,0 +1,77 @@
#include "mbox.h"
#include "util.h"
#include <pthread.h>
struct mbox *mbox_create(void)
{
struct mbox *mbox = xmalloc(sizeof(struct mbox));
FATAL_UNLESS(0 == pthread_cond_init(&mbox->filled_cond, NULL),
"Failed to initialise a condition variable");
FATAL_UNLESS(0 == pthread_cond_init(&mbox->emptied_cond, NULL),
"Failed to initialise a condition variable");
FATAL_UNLESS(0 == pthread_mutex_init(&mbox->mutex, NULL),
"Failed to initialise a mutex");
return mbox;
}
void mbox_post(struct mbox *mbox, void *contents)
{
pthread_mutex_lock(&mbox->mutex);
{
if (mbox->full) {
pthread_cond_wait(&mbox->emptied_cond, &mbox->mutex);
}
mbox->contents = contents;
mbox->full = 1;
while (0 != pthread_cond_signal(&mbox->filled_cond));
}
pthread_mutex_unlock(&mbox->mutex);
}
void *mbox_contents(struct mbox *mbox)
{
return mbox->contents;
}
int mbox_is_full(struct mbox *mbox)
{
return mbox->full;
}
void *mbox_receive(struct mbox *mbox)
{
NULLCHECK(mbox);
void *result;
pthread_mutex_lock(&mbox->mutex);
{
if (!mbox->full) {
pthread_cond_wait(&mbox->filled_cond, &mbox->mutex);
}
mbox->full = 0;
result = mbox->contents;
mbox->contents = NULL;
while (0 != pthread_cond_signal(&mbox->emptied_cond));
}
pthread_mutex_unlock(&mbox->mutex);
return result;
}
void mbox_destroy(struct mbox *mbox)
{
NULLCHECK(mbox);
while (0 != pthread_cond_destroy(&mbox->emptied_cond));
while (0 != pthread_cond_destroy(&mbox->filled_cond));
while (0 != pthread_mutex_destroy(&mbox->mutex));
free(mbox);
}

View File

@@ -14,42 +14,42 @@
struct mbox {
void * contents;
void *contents;
/** Marker to tell us if there's content in the box.
* Keeping this separate allows us to use NULL for the contents.
*/
int full;
int full;
/** This gets signaled by mbox_post, and waited on by
* mbox_receive */
pthread_cond_t filled_cond;
pthread_cond_t filled_cond;
/** This is signaled by mbox_receive, and waited on by mbox_post */
pthread_cond_t emptied_cond;
pthread_mutex_t mutex;
pthread_cond_t emptied_cond;
pthread_mutex_t mutex;
};
/* Create an mbox. */
struct mbox * mbox_create(void);
struct mbox *mbox_create(void);
/* Put something in the mbox, blocking if it's already full.
* That something can be NULL if you want.
*/
void mbox_post( struct mbox *, void *);
void mbox_post(struct mbox *, void *);
/* See what's in the mbox. This isn't thread-safe. */
void * mbox_contents( struct mbox *);
void *mbox_contents(struct mbox *);
/* See if anything has been put into the mbox. This isn't thread-safe.
* */
int mbox_is_full( struct mbox *);
int mbox_is_full(struct mbox *);
/* Get the contents from the mbox, blocking if there's nothing there. */
void * mbox_receive( struct mbox *);
void *mbox_receive(struct mbox *);
/* Free the mbox and destroy the associated pthread bits. */
void mbox_destroy( struct mbox *);
void mbox_destroy(struct mbox *);
#endif

1071
src/server/mirror.c Normal file

File diff suppressed because it is too large Load Diff

139
src/server/mirror.h Normal file
View File

@@ -0,0 +1,139 @@
#ifndef MIRROR_H
#define MIRROR_H
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
#include "bitset.h"
#include "self_pipe.h"
enum mirror_state;
#include "serve.h"
#include "mbox.h"
/* MS_CONNECT_TIME_SECS
* The length of time after which the sender will assume a connect() to
* the destination has failed.
*/
#define MS_CONNECT_TIME_SECS 60
/* MS_MAX_DOWNTIME_SECS
* The length of time a migration must be estimated to have remaining for us to
* disconnect clients for convergence
*
* TODO: Make this configurable so refusing-to-converge clients can be manually
* fixed.
* TODO: Make this adaptive - 5 seconds is fine, as long as we can guarantee
* that all migrations will be able to converge in time. We'd add a new
* state between open and closed, where gradually-increasing latency is
* added to client requests to allow the mirror to be faster.
*/
#define MS_CONVERGE_TIME_SECS 5
/* MS_HELLO_TIME_SECS
* The length of time the sender will wait for the NBD hello message
* after connect() before aborting the connection attempt.
*/
#define MS_HELLO_TIME_SECS 5
/* MS_RETRY_DELAY_SECS
* The delay after a failed migration attempt before launching another
* thread to try again.
*/
#define MS_RETRY_DELAY_SECS 1
/* MS_REQUEST_LIMIT_SECS
* We must receive a reply to a request within this time. For a read
* request, this is the time between the end of the NBD request and the
* start of the NBD reply. For a write request, this is the time
* between the end of the written data and the start of the NBD reply.
* Can be overridden by the environment variable:
* FLEXNBD_MS_REQUEST_LIMIT_SECS
*/
#define MS_REQUEST_LIMIT_SECS 60
#define MS_REQUEST_LIMIT_SECS_F 60.0
enum mirror_finish_action {
ACTION_EXIT,
ACTION_UNLINK,
ACTION_NOTHING
};
enum mirror_state {
MS_UNKNOWN,
MS_INIT,
MS_GO,
MS_ABANDONED,
MS_DONE,
MS_FAIL_CONNECT,
MS_FAIL_REJECTED,
MS_FAIL_NO_HELLO,
MS_FAIL_SIZE_MISMATCH
};
struct mirror {
pthread_t thread;
/* Signal to this then join the thread if you want to abandon mirroring */
struct self_pipe *abandon_signal;
union mysockaddr *connect_to;
union mysockaddr *connect_from;
int client;
const char *filename;
/* Limiter, used to restrict migration speed Only dirty bytes (those going
* over the network) are considered */
uint64_t max_bytes_per_second;
enum mirror_finish_action action_at_finish;
char *mapped;
/* We need to send every byte at least once; we do so by */
uint64_t offset;
enum mirror_state commit_state;
/* commit_signal is sent immediately after attempting to connect
* and checking the remote size, whether successful or not.
*/
struct mbox *commit_signal;
/* The time (from monotonic_time_ms()) the migration was started. Can be
* used to calculate bps, etc. */
uint64_t migration_started;
/* Running count of all bytes we've transferred */
uint64_t all_dirty;
};
struct mirror_super {
struct mirror *mirror;
pthread_t thread;
struct mbox *state_mbox;
};
/* We need these declaration to get around circular dependencies in the
* .h's
*/
struct server;
struct flexnbd;
struct mirror_super *mirror_super_create(const char *filename,
union mysockaddr *connect_to,
union mysockaddr *connect_from,
uint64_t max_Bps,
enum mirror_finish_action
action_at_finish,
struct mbox *state_mbox);
void *mirror_super_runner(void *serve_uncast);
#endif

889
src/server/mode.c Normal file
View File

@@ -0,0 +1,889 @@
#include "mode.h"
#include "flexnbd.h"
#include <getopt.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
static struct option serve_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_FILE,
GETOPT_SOCK,
GETOPT_DENY,
GETOPT_QUIET,
GETOPT_KILLSWITCH,
GETOPT_VERBOSE,
{0}
};
static char serve_short_options[] = "hl:p:f:s:dk" SOPT_QUIET SOPT_VERBOSE;
static char serve_help_text[] =
"Usage: flexnbd " CMD_SERVE " <options> [<acl address>*]\n\n"
"Serve FILE from ADDR:PORT, with an optional control socket at SOCK.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to serve on.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to serve on.\n"
"\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n"
"\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n"
"\t--" OPT_KILLSWITCH
",-k \tKill the server if a request takes 120 seconds.\n" SOCK_LINE
VERBOSE_LINE QUIET_LINE;
static struct option listen_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_FILE,
GETOPT_SOCK,
GETOPT_DENY,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char listen_short_options[] = "hl:p:f:s:d" SOPT_QUIET SOPT_VERBOSE;
static char listen_help_text[] =
"Usage: flexnbd " CMD_LISTEN " <options> [<acl_address>*]\n\n"
"Listen for an incoming migration on ADDR:PORT."
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to listen on.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to listen on.\n"
"\t--" OPT_FILE ",-f <FILE>\tThe file to serve.\n"
"\t--" OPT_DENY ",-d\tDeny connections by default unless in ACL.\n"
SOCK_LINE VERBOSE_LINE QUIET_LINE;
static struct option read_options[] = {
GETOPT_HELP,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_FROM,
GETOPT_SIZE,
GETOPT_BIND,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char read_short_options[] = "hl:p:F:S:b:" SOPT_QUIET SOPT_VERBOSE;
static char read_help_text[] =
"Usage: flexnbd " CMD_READ " <options>\n\n"
"Read SIZE bytes from a server at ADDR:PORT to stdout, starting at OFFSET.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to read from.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to read from.\n"
"\t--" OPT_FROM ",-F <OFFSET>\tByte offset to read from.\n"
"\t--" OPT_SIZE ",-S <SIZE>\tBytes to read.\n"
BIND_LINE VERBOSE_LINE QUIET_LINE;
static struct option *write_options = read_options;
static char *write_short_options = read_short_options;
static char write_help_text[] =
"Usage: flexnbd " CMD_WRITE " <options>\n\n"
"Write SIZE bytes from stdin to a server at ADDR:PORT, starting at OFFSET.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to write to.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to write to.\n"
"\t--" OPT_FROM ",-F <OFFSET>\tByte offset to write from.\n"
"\t--" OPT_SIZE ",-S <SIZE>\tBytes to write.\n"
BIND_LINE VERBOSE_LINE QUIET_LINE;
static struct option acl_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char acl_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char acl_help_text[] =
"Usage: flexnbd " CMD_ACL " <options> [<acl address>+]\n\n"
"Set the access control list for a server with control socket SOCK.\n\n"
HELP_LINE SOCK_LINE VERBOSE_LINE QUIET_LINE;
static struct option mirror_speed_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_MAX_SPEED,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char mirror_speed_short_options[] = "hs:m:" SOPT_QUIET SOPT_VERBOSE;
static char mirror_speed_help_text[] =
"Usage: flexnbd " CMD_MIRROR_SPEED " <options>\n\n"
"Set the maximum speed of a migration from a mirring server listening on SOCK.\n\n"
HELP_LINE SOCK_LINE MAX_SPEED_LINE VERBOSE_LINE QUIET_LINE;
static struct option mirror_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_ADDR,
GETOPT_PORT,
GETOPT_UNLINK,
GETOPT_BIND,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char mirror_short_options[] = "hs:l:p:ub:" SOPT_QUIET SOPT_VERBOSE;
static char mirror_help_text[] =
"Usage: flexnbd " CMD_MIRROR " <options>\n\n"
"Start mirroring from the server with control socket SOCK to one at ADDR:PORT.\n\n"
HELP_LINE
"\t--" OPT_ADDR ",-l <ADDR>\tThe address to mirror to.\n"
"\t--" OPT_PORT ",-p <PORT>\tThe port to mirror to.\n"
SOCK_LINE
"\t--" OPT_UNLINK ",-u\tUnlink the local file when done.\n"
BIND_LINE VERBOSE_LINE QUIET_LINE;
static struct option break_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char break_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char break_help_text[] =
"Usage: flexnbd " CMD_BREAK " <options>\n\n"
"Stop mirroring from the server with control socket SOCK.\n\n"
HELP_LINE SOCK_LINE VERBOSE_LINE QUIET_LINE;
static struct option status_options[] = {
GETOPT_HELP,
GETOPT_SOCK,
GETOPT_QUIET,
GETOPT_VERBOSE,
{0}
};
static char status_short_options[] = "hs:" SOPT_QUIET SOPT_VERBOSE;
static char status_help_text[] =
"Usage: flexnbd " CMD_STATUS " <options>\n\n"
"Get the status for a server with control socket SOCK.\n\n"
HELP_LINE SOCK_LINE VERBOSE_LINE QUIET_LINE;
char help_help_text_arr[] =
"Usage: flexnbd <cmd> [cmd options]\n\n"
"Commands:\n"
"\tflexnbd serve\n"
"\tflexnbd listen\n"
"\tflexnbd read\n"
"\tflexnbd write\n"
"\tflexnbd acl\n"
"\tflexnbd mirror\n"
"\tflexnbd mirror-speed\n"
"\tflexnbd break\n"
"\tflexnbd status\n"
"\tflexnbd help\n\n" "See flexnbd help <cmd> for further info\n";
/* Slightly odd array/pointer pair to stop the compiler from complaining
* about symbol sizes
*/
char *help_help_text = help_help_text_arr;
void do_read(struct mode_readwrite_params *params);
void do_write(struct mode_readwrite_params *params);
void do_remote_command(char *command, char *mode, int argc, char **argv);
void read_serve_param(int c, char **ip_addr, char **ip_port, char **file,
char **sock, int *default_deny, int *use_killswitch)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", serve_help_text);
exit(0);
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'f':
*file = optarg;
break;
case 's':
*sock = optarg;
break;
case 'd':
*default_deny = 1;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
case 'k':
*use_killswitch = 1;
break;
default:
exit_err(serve_help_text);
break;
}
}
void read_listen_param(int c,
char **ip_addr,
char **ip_port,
char **file, char **sock, int *default_deny)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", listen_help_text);
exit(0);
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'f':
*file = optarg;
break;
case 's':
*sock = optarg;
break;
case 'd':
*default_deny = 1;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(listen_help_text);
break;
}
}
void read_readwrite_param(int c, char **ip_addr, char **ip_port,
char **bind_addr, char **from, char **size,
char *err_text)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", err_text);
exit(0);
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'F':
*from = optarg;
break;
case 'S':
*size = optarg;
break;
case 'b':
*bind_addr = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(err_text);
break;
}
}
void read_sock_param(int c, char **sock, char *help_text)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", help_text);
exit(0);
case 's':
*sock = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(help_text);
break;
}
}
void read_acl_param(int c, char **sock)
{
read_sock_param(c, sock, acl_help_text);
}
void read_mirror_speed_param(int c, char **sock, char **max_speed)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", mirror_speed_help_text);
exit(0);
case 's':
*sock = optarg;
break;
case 'm':
*max_speed = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(mirror_speed_help_text);
break;
}
}
void read_mirror_param(int c,
char **sock,
char **ip_addr,
char **ip_port, int *unlink, char **bind_addr)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", mirror_help_text);
exit(0);
case 's':
*sock = optarg;
break;
case 'l':
*ip_addr = optarg;
break;
case 'p':
*ip_port = optarg;
break;
case 'u':
*unlink = 1;
break;
case 'b':
*bind_addr = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(mirror_help_text);
break;
}
}
void read_break_param(int c, char **sock)
{
switch (c) {
case 'h':
fprintf(stdout, "%s\n", break_help_text);
exit(0);
case 's':
*sock = optarg;
break;
case 'q':
log_level = QUIET_LOG_LEVEL;
break;
case 'v':
log_level = VERBOSE_LOG_LEVEL;
break;
default:
exit_err(break_help_text);
break;
}
}
void read_status_param(int c, char **sock)
{
read_sock_param(c, sock, status_help_text);
}
int mode_serve(int argc, char *argv[])
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *file = NULL;
char *sock = NULL;
int default_deny = 0; // not on by default
int use_killswitch = 0;
int err = 0;
int success;
struct flexnbd *flexnbd;
while (1) {
c = getopt_long(argc, argv, serve_short_options, serve_options,
NULL);
if (c == -1) {
break;
}
read_serve_param(c, &ip_addr, &ip_port, &file, &sock,
&default_deny, &use_killswitch);
}
if (NULL == ip_addr || NULL == ip_port) {
err = 1;
fprintf(stderr, "both --addr and --port are required.\n");
}
if (NULL == file) {
err = 1;
fprintf(stderr, "--file is required\n");
}
if (err) {
exit_err(serve_help_text);
}
flexnbd =
flexnbd_create_serving(ip_addr, ip_port, file, sock, default_deny,
argc - optind, argv + optind,
MAX_NBD_CLIENTS, use_killswitch);
info("Serving file %s", file);
success = flexnbd_serve(flexnbd);
flexnbd_destroy(flexnbd);
return success ? 0 : 1;
}
int mode_listen(int argc, char *argv[])
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *file = NULL;
char *sock = NULL;
int default_deny = 0; // not on by default
int err = 0;
int success;
struct flexnbd *flexnbd;
while (1) {
c = getopt_long(argc, argv, listen_short_options, listen_options,
NULL);
if (c == -1) {
break;
}
read_listen_param(c, &ip_addr, &ip_port,
&file, &sock, &default_deny);
}
if (NULL == ip_addr || NULL == ip_port) {
err = 1;
fprintf(stderr, "both --addr and --port are required.\n");
}
if (NULL == file) {
err = 1;
fprintf(stderr, "--file is required\n");
}
if (err) {
exit_err(listen_help_text);
}
flexnbd = flexnbd_create_listening(ip_addr,
ip_port,
file,
sock,
default_deny,
argc - optind, argv + optind);
success = flexnbd_serve(flexnbd);
flexnbd_destroy(flexnbd);
return success ? 0 : 1;
}
/* TODO: Separate this function.
* It should be:
* params_read( struct mode_readwrite_params* out,
* char *s_ip_address,
* char *s_port,
* char *s_from,
* char *s_length )
* params_write( struct mode_readwrite_params* out,
* char *s_ip_address,
* char *s_port,
* char *s_from,
* char *s_length,
* char *s_filename )
*/
void params_readwrite(int write_not_read,
struct mode_readwrite_params *out,
char *s_ip_address,
char *s_port,
char *s_bind_address,
char *s_from, char *s_length_or_filename)
{
FATAL_IF_NULL(s_ip_address, "No IP address supplied");
FATAL_IF_NULL(s_port, "No port number supplied");
FATAL_IF_NULL(s_from, "No from supplied");
FATAL_IF_NULL(s_length_or_filename, "No length supplied");
FATAL_IF_ZERO(parse_ip_to_sockaddr
(&out->connect_to.generic, s_ip_address),
"Couldn't parse connection address '%s'", s_ip_address);
if (s_bind_address != NULL &&
parse_ip_to_sockaddr(&out->connect_from.generic,
s_bind_address) == 0) {
fatal("Couldn't parse bind address '%s'", s_bind_address);
}
parse_port(s_port, &out->connect_to.v4);
long signed_from = atol(s_from);
FATAL_IF_NEGATIVE(signed_from,
"Can't read from a negative offset %d.",
signed_from);
out->from = signed_from;
if (write_not_read) {
if (s_length_or_filename[0] - 48 < 10) {
out->len = atol(s_length_or_filename);
out->data_fd = 0;
} else {
out->data_fd = open(s_length_or_filename, O_RDONLY);
FATAL_IF_NEGATIVE(out->data_fd,
"Couldn't open %s", s_length_or_filename);
off64_t signed_len = lseek64(out->data_fd, 0, SEEK_END);
FATAL_IF_NEGATIVE(signed_len,
"Couldn't find length of %s",
s_length_or_filename);
out->len = signed_len;
FATAL_IF_NEGATIVE(lseek64(out->data_fd, 0, SEEK_SET),
"Couldn't rewind %s", s_length_or_filename);
}
} else {
out->len = atol(s_length_or_filename);
out->data_fd = 1;
}
}
int mode_read(int argc, char *argv[])
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *bind_addr = NULL;
char *from = NULL;
char *size = NULL;
int err = 0;
struct mode_readwrite_params readwrite;
while (1) {
c = getopt_long(argc, argv, read_short_options, read_options,
NULL);
if (c == -1) {
break;
}
read_readwrite_param(c, &ip_addr, &ip_port, &bind_addr, &from,
&size, read_help_text);
}
if (NULL == ip_addr || NULL == ip_port) {
err = 1;
fprintf(stderr, "both --addr and --port are required.\n");
}
if (NULL == from || NULL == size) {
err = 1;
fprintf(stderr, "both --from and --size are required.\n");
}
if (err) {
exit_err(read_help_text);
}
memset(&readwrite, 0, sizeof(readwrite));
params_readwrite(0, &readwrite, ip_addr, ip_port, bind_addr, from,
size);
do_read(&readwrite);
return 0;
}
int mode_write(int argc, char *argv[])
{
int c;
char *ip_addr = NULL;
char *ip_port = NULL;
char *bind_addr = NULL;
char *from = NULL;
char *size = NULL;
int err = 0;
struct mode_readwrite_params readwrite;
while (1) {
c = getopt_long(argc, argv, write_short_options, write_options,
NULL);
if (c == -1) {
break;
}
read_readwrite_param(c, &ip_addr, &ip_port, &bind_addr, &from,
&size, write_help_text);
}
if (NULL == ip_addr || NULL == ip_port) {
err = 1;
fprintf(stderr, "both --addr and --port are required.\n");
}
if (NULL == from || NULL == size) {
err = 1;
fprintf(stderr, "both --from and --size are required.\n");
}
if (err) {
exit_err(write_help_text);
}
memset(&readwrite, 0, sizeof(readwrite));
params_readwrite(1, &readwrite, ip_addr, ip_port, bind_addr, from,
size);
do_write(&readwrite);
return 0;
}
int mode_acl(int argc, char *argv[])
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long(argc, argv, acl_short_options, acl_options, NULL);
if (c == -1) {
break;
}
read_acl_param(c, &sock);
}
if (NULL == sock) {
fprintf(stderr, "--sock is required.\n");
exit_err(acl_help_text);
}
/* Don't use the CMD_ACL macro here, "acl" is the remote command
* name, not the cli option
*/
do_remote_command("acl", sock, argc - optind, argv + optind);
return 0;
}
int mode_mirror_speed(int argc, char *argv[])
{
int c;
char *sock = NULL;
char *speed = NULL;
while (1) {
c = getopt_long(argc, argv, mirror_speed_short_options,
mirror_speed_options, NULL);
if (-1 == c) {
break;
}
read_mirror_speed_param(c, &sock, &speed);
}
if (NULL == sock) {
fprintf(stderr, "--sock is required.\n");
exit_err(mirror_speed_help_text);
}
if (NULL == speed) {
fprintf(stderr, "--max-speed is required.\n");
exit_err(mirror_speed_help_text);
}
do_remote_command("mirror_max_bps", sock, 1, &speed);
return 0;
}
int mode_mirror(int argc, char *argv[])
{
int c;
char *sock = NULL;
char *remote_argv[4] = { 0 };
int err = 0;
int unlink = 0;
remote_argv[2] = "exit";
while (1) {
c = getopt_long(argc, argv, mirror_short_options, mirror_options,
NULL);
if (-1 == c) {
break;
}
read_mirror_param(c,
&sock,
&remote_argv[0],
&remote_argv[1], &unlink, &remote_argv[3]);
}
if (NULL == sock) {
fprintf(stderr, "--sock is required.\n");
err = 1;
}
if (NULL == remote_argv[0] || NULL == remote_argv[1]) {
fprintf(stderr, "both --addr and --port are required.\n");
err = 1;
}
if (err) {
exit_err(mirror_help_text);
}
if (unlink) {
remote_argv[2] = "unlink";
}
if (remote_argv[3] == NULL) {
do_remote_command("mirror", sock, 3, remote_argv);
} else {
do_remote_command("mirror", sock, 4, remote_argv);
}
return 0;
}
int mode_break(int argc, char *argv[])
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long(argc, argv, break_short_options, break_options,
NULL);
if (-1 == c) {
break;
}
read_break_param(c, &sock);
}
if (NULL == sock) {
fprintf(stderr, "--sock is required.\n");
exit_err(break_help_text);
}
do_remote_command("break", sock, argc - optind, argv + optind);
return 0;
}
int mode_status(int argc, char *argv[])
{
int c;
char *sock = NULL;
while (1) {
c = getopt_long(argc, argv, status_short_options, status_options,
NULL);
if (-1 == c) {
break;
}
read_status_param(c, &sock);
}
if (NULL == sock) {
fprintf(stderr, "--sock is required.\n");
exit_err(status_help_text);
}
do_remote_command("status", sock, argc - optind, argv + optind);
return 0;
}
int mode_help(int argc, char *argv[])
{
char *cmd;
char *help_text = NULL;
if (argc < 1) {
help_text = help_help_text;
} else {
cmd = argv[0];
if (IS_CMD(CMD_SERVE, cmd)) {
help_text = serve_help_text;
} else if (IS_CMD(CMD_LISTEN, cmd)) {
help_text = listen_help_text;
} else if (IS_CMD(CMD_READ, cmd)) {
help_text = read_help_text;
} else if (IS_CMD(CMD_WRITE, cmd)) {
help_text = write_help_text;
} else if (IS_CMD(CMD_ACL, cmd)) {
help_text = acl_help_text;
} else if (IS_CMD(CMD_MIRROR, cmd)) {
help_text = mirror_help_text;
} else if (IS_CMD(CMD_STATUS, cmd)) {
help_text = status_help_text;
} else {
exit_err(help_help_text);
}
}
fprintf(stdout, "%s\n", help_text);
return 0;
}
void mode(char *mode, int argc, char **argv)
{
if (IS_CMD(CMD_SERVE, mode)) {
exit(mode_serve(argc, argv));
} else if (IS_CMD(CMD_LISTEN, mode)) {
exit(mode_listen(argc, argv));
} else if (IS_CMD(CMD_READ, mode)) {
mode_read(argc, argv);
} else if (IS_CMD(CMD_WRITE, mode)) {
mode_write(argc, argv);
} else if (IS_CMD(CMD_ACL, mode)) {
mode_acl(argc, argv);
} else if (IS_CMD(CMD_MIRROR_SPEED, mode)) {
mode_mirror_speed(argc, argv);
} else if (IS_CMD(CMD_MIRROR, mode)) {
mode_mirror(argc, argv);
} else if (IS_CMD(CMD_BREAK, mode)) {
mode_break(argc, argv);
} else if (IS_CMD(CMD_STATUS, mode)) {
mode_status(argc, argv);
} else if (IS_CMD(CMD_HELP, mode)) {
mode_help(argc - 1, argv + 1);
} else {
mode_help(argc - 1, argv + 1);
exit(1);
}
exit(0);
}

988
src/server/serve.c Normal file
View File

@@ -0,0 +1,988 @@
#include "serve.h"
#include "client.h"
#include "nbdtypes.h"
#include "ioutil.h"
#include "sockutil.h"
#include "util.h"
#include "bitset.h"
#include "control.h"
#include "self_pipe.h"
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/un.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
struct server *server_create(struct flexnbd *flexnbd,
char *s_ip_address,
char *s_port,
char *s_file,
int default_deny,
int acl_entries,
char **s_acl_entries,
int max_nbd_clients,
int use_killswitch, int success)
{
NULLCHECK(flexnbd);
struct server *out;
out = xmalloc(sizeof(struct server));
out->flexnbd = flexnbd;
out->success = success;
out->max_nbd_clients = max_nbd_clients;
out->use_killswitch = use_killswitch;
server_allow_new_clients(out);
out->nbd_client =
xmalloc(max_nbd_clients * sizeof(struct client_tbl_entry));
out->tcp_backlog = 10; /* does this need to be settable? */
FATAL_IF_NULL(s_ip_address, "No IP address supplied");
FATAL_IF_NULL(s_port, "No port number supplied");
FATAL_IF_NULL(s_file, "No filename supplied");
NULLCHECK(s_ip_address);
FATAL_IF_ZERO(parse_ip_to_sockaddr
(&out->bind_to.generic, s_ip_address),
"Couldn't parse server address '%s' (use 0 if "
"you want to bind to all IPs)", s_ip_address);
out->acl = acl_create(acl_entries, s_acl_entries, default_deny);
if (out->acl && out->acl->len != acl_entries) {
fatal("Bad ACL entry '%s'", s_acl_entries[out->acl->len]);
}
parse_port(s_port, &out->bind_to.v4);
out->filename = s_file;
out->l_acl = flexthread_mutex_create();
out->l_start_mirror = flexthread_mutex_create();
out->mirror_can_start = 1;
out->close_signal = self_pipe_create();
out->acl_updated_signal = self_pipe_create();
NULLCHECK(out->close_signal);
NULLCHECK(out->acl_updated_signal);
log_context = s_file;
return out;
}
void server_destroy(struct server *serve)
{
self_pipe_destroy(serve->acl_updated_signal);
serve->acl_updated_signal = NULL;
self_pipe_destroy(serve->close_signal);
serve->close_signal = NULL;
flexthread_mutex_destroy(serve->l_start_mirror);
flexthread_mutex_destroy(serve->l_acl);
if (serve->acl) {
acl_destroy(serve->acl);
serve->acl = NULL;
}
free(serve->nbd_client);
free(serve);
}
void server_unlink(struct server *serve)
{
NULLCHECK(serve);
NULLCHECK(serve->filename);
FATAL_IF_NEGATIVE(unlink(serve->filename),
"Failed to unlink %s: %s",
serve->filename, strerror(errno));
}
#define SERVER_LOCK( s, f, msg ) \
do { NULLCHECK( s ); \
FATAL_IF( 0 != flexthread_mutex_lock( s->f ), msg ); } while (0)
#define SERVER_UNLOCK( s, f, msg ) \
do { NULLCHECK( s ); \
FATAL_IF( 0 != flexthread_mutex_unlock( s->f ), msg ); } while (0)
void server_lock_acl(struct server *serve)
{
debug("ACL locking");
SERVER_LOCK(serve, l_acl, "Problem with ACL lock");
}
void server_unlock_acl(struct server *serve)
{
debug("ACL unlocking");
SERVER_UNLOCK(serve, l_acl, "Problem with ACL unlock");
}
int server_acl_locked(struct server *serve)
{
NULLCHECK(serve);
return flexthread_mutex_held(serve->l_acl);
}
void server_lock_start_mirror(struct server *serve)
{
debug("Mirror start locking");
SERVER_LOCK(serve, l_start_mirror, "Problem with start mirror lock");
}
void server_unlock_start_mirror(struct server *serve)
{
debug("Mirror start unlocking");
SERVER_UNLOCK(serve, l_start_mirror,
"Problem with start mirror unlock");
}
int server_start_mirror_locked(struct server *serve)
{
NULLCHECK(serve);
return flexthread_mutex_held(serve->l_start_mirror);
}
/** Return the actual port the server bound to. This is used because we
* are allowed to pass "0" on the command-line.
*/
int server_port(struct server *server)
{
NULLCHECK(server);
union mysockaddr addr;
socklen_t len = sizeof(addr.v4);
if (getsockname(server->server_fd, &addr.v4, &len) < 0) {
fatal("Failed to get the port number.");
}
return be16toh(addr.v4.sin_port);
}
/** Prepares a listening socket for the NBD server, binding etc. */
void serve_open_server_socket(struct server *params)
{
NULLCHECK(params);
params->server_fd =
socket(params->bind_to.generic.sa_family ==
AF_INET ? PF_INET : PF_INET6, SOCK_STREAM, 0);
FATAL_IF_NEGATIVE(params->server_fd, "Couldn't create server socket");
/* We need SO_REUSEADDR so that when we switch from listening to
* serving we don't have to change address if we don't want to.
*
* If this fails, it's not necessarily bad in principle, but at
* this point in the code we can't tell if it's going to be a
* problem. It's also indicative of something odd going on, so
* we barf.
*/
FATAL_IF_NEGATIVE(sock_set_reuseaddr(params->server_fd, 1),
"Couldn't set SO_REUSEADDR");
/* TCP_NODELAY makes everything not be slow. If we can't set
* this, again, there's something odd going on which we don't
* understand.
*/
FATAL_IF_NEGATIVE(sock_set_tcp_nodelay(params->server_fd, 1),
"Couldn't set TCP_NODELAY");
/* If we can't bind, presumably that's because someone else is
* squatting on our ip/port combo, or the ip isn't yet
* configured. Ideally we want to retry this. */
FATAL_UNLESS_ZERO(sock_try_bind
(params->server_fd, &params->bind_to.generic),
SHOW_ERRNO("Failed to bind() socket")
);
FATAL_IF_NEGATIVE(listen(params->server_fd, params->tcp_backlog),
"Couldn't listen on server socket");
}
int tryjoin_client_thread(struct client_tbl_entry *entry,
int (*joinfunc) (pthread_t, void **))
{
NULLCHECK(entry);
NULLCHECK(joinfunc);
int was_closed = 0;
void *status = NULL;
if (entry->thread != 0) {
char s_client_address[128];
sockaddr_address_string(&entry->address.generic,
&s_client_address[0], 128);
debug("%s(%p,...)",
joinfunc == pthread_join ? "joining" : "tryjoining",
entry->thread);
int join_errno = joinfunc(entry->thread, &status);
/* join_errno can legitimately be ESRCH if the thread is
* already dead, but the client still needs tidying up. */
if (join_errno != 0 && !entry->client->stopped) {
debug("join_errno was %s, stopped was %d",
strerror(join_errno), entry->client->stopped);
FATAL_UNLESS(join_errno == EBUSY,
"Problem with joining thread %p: %s",
entry->thread, strerror(join_errno));
} else if (join_errno == 0) {
debug("nbd thread %016x exited (%s) with status %ld",
entry->thread, s_client_address, (uintptr_t) status);
client_destroy(entry->client);
entry->client = NULL;
entry->thread = 0;
was_closed = 1;
}
}
return was_closed;
}
/**
* Check to see if a client thread has finished, and if so, tidy up
* after it.
* Returns 1 if the thread was cleaned up and the slot freed, 0
* otherwise.
*
* It's important that client_destroy gets called in the same thread
* which signals the client threads to stop. This avoids the
* possibility of sending a stop signal via a signal which has already
* been destroyed. However, it means that stopped client threads,
* including their signal pipes, won't be cleaned up until the next new
* client connection attempt.
*/
int cleanup_client_thread(struct client_tbl_entry *entry)
{
return tryjoin_client_thread(entry, pthread_tryjoin_np);
}
void cleanup_client_threads(struct client_tbl_entry *entries,
size_t entries_len)
{
size_t i;
for (i = 0; i < entries_len; i++) {
cleanup_client_thread(&entries[i]);
}
}
/**
* Join a client thread after having sent a stop signal to it.
* This function will not return until pthread_join has returned, so
* ensures that the client thread is dead.
*/
int join_client_thread(struct client_tbl_entry *entry)
{
return tryjoin_client_thread(entry, pthread_join);
}
/** We can only accommodate MAX_NBD_CLIENTS connections at once. This function
* goes through the current list, waits for any threads that have finished
* and returns the next slot free (or -1 if there are none).
*/
int cleanup_and_find_client_slot(struct server *params)
{
NULLCHECK(params);
int slot = -1, i;
cleanup_client_threads(params->nbd_client, params->max_nbd_clients);
for (i = 0; i < params->max_nbd_clients; i++) {
if (params->nbd_client[i].thread == 0 && slot == -1) {
slot = i;
break;
}
}
return slot;
}
int server_count_clients(struct server *params)
{
NULLCHECK(params);
int i, count = 0;
cleanup_client_threads(params->nbd_client, params->max_nbd_clients);
for (i = 0; i < params->max_nbd_clients; i++) {
if (params->nbd_client[i].thread != 0) {
count++;
}
}
return count;
}
/** Check whether the address client_address is allowed or not according
* to the current acl. If params->acl is NULL, the result will be 1,
* otherwise it will be the result of acl_includes().
*/
int server_acl_accepts(struct server *params,
union mysockaddr *client_address)
{
NULLCHECK(params);
NULLCHECK(client_address);
struct acl *acl;
int accepted;
server_lock_acl(params);
{
acl = params->acl;
accepted = acl ? acl_includes(acl, client_address) : 1;
}
server_unlock_acl(params);
return accepted;
}
int server_should_accept_client(struct server *params,
union mysockaddr *client_address,
char *s_client_address,
size_t s_client_address_len)
{
NULLCHECK(params);
NULLCHECK(client_address);
NULLCHECK(s_client_address);
const char *result =
sockaddr_address_string(&client_address->generic, s_client_address,
s_client_address_len);
if (NULL == result) {
warn("Rejecting client %s: Bad client_address", s_client_address);
return 0;
}
if (!server_acl_accepts(params, client_address)) {
warn("Rejecting client %s: Access control error",
s_client_address);
debug("We %s have an acl, and default_deny is %s",
(params->acl ? "do" : "do not"),
(params->acl->default_deny ? "true" : "false"));
return 0;
}
return 1;
}
int spawn_client_thread(struct client *client_params,
pthread_t * out_thread)
{
int result =
pthread_create(out_thread, NULL, client_serve, client_params);
return result;
}
/** Dispatch function for accepting an NBD connection and starting a thread
* to handle it. Rejects the connection if there is an ACL, and the far end's
* address doesn't match, or if there are too many clients already connected.
*/
void accept_nbd_client(struct server *params,
int client_fd, union mysockaddr *client_address)
{
NULLCHECK(params);
NULLCHECK(client_address);
struct client *client_params;
int slot;
char s_client_address[64] = { 0 };
FATAL_IF_NEGATIVE(sock_set_keepalive_params
(client_fd, CLIENT_KEEPALIVE_TIME,
CLIENT_KEEPALIVE_INTVL, CLIENT_KEEPALIVE_PROBES),
"Error setting keepalive parameters on client socket fd %d",
client_fd);
if (!server_should_accept_client
(params, client_address, s_client_address, 64)) {
FATAL_IF_NEGATIVE(close(client_fd),
"Error closing client socket fd %d", client_fd);
debug("Closed client socket fd %d", client_fd);
return;
}
slot = cleanup_and_find_client_slot(params);
if (slot < 0) {
warn("too many clients to accept connection");
FATAL_IF_NEGATIVE(close(client_fd),
"Error closing client socket fd %d", client_fd);
debug("Closed client socket fd %d", client_fd);
return;
}
info("Client %s accepted on fd %d.", s_client_address, client_fd);
client_params = client_create(params, client_fd);
params->nbd_client[slot].client = client_params;
memcpy(&params->nbd_client[slot].address, client_address,
sizeof(union mysockaddr));
pthread_t *thread = &params->nbd_client[slot].thread;
if (0 != spawn_client_thread(client_params, thread)) {
debug("Thread creation problem.");
client_destroy(client_params);
FATAL_IF_NEGATIVE(close(client_fd),
"Error closing client socket fd %d", client_fd);
debug("Closed client socket fd %d", client_fd);
return;
}
debug("nbd thread %p started (%s)", params->nbd_client[slot].thread,
s_client_address);
}
void server_audit_clients(struct server *serve)
{
NULLCHECK(serve);
int i;
struct client_tbl_entry *entry;
/* There's an apparent race here. If the acl updates while
* we're traversing the nbd_clients array, the earlier entries
* won't have been audited against the later acl. This isn't a
* problem though, because in order to update the acl
* server_replace_acl must have been called, so the
* server_accept loop will see a second acl_updated signal as
* soon as it hits select, and a second audit will be run.
*/
for (i = 0; i < serve->max_nbd_clients; i++) {
entry = &serve->nbd_client[i];
if (0 == entry->thread) {
continue;
}
if (server_acl_accepts(serve, &entry->address)) {
continue;
}
client_signal_stop(entry->client);
}
}
int server_is_closed(struct server *serve)
{
NULLCHECK(serve);
return fd_is_closed(serve->server_fd);
}
void server_close_clients(struct server *params)
{
NULLCHECK(params);
info("closing all clients");
int i; /* , j; */
struct client_tbl_entry *entry;
for (i = 0; i < params->max_nbd_clients; i++) {
entry = &params->nbd_client[i];
if (entry->thread != 0) {
debug("Stop signaling client %p", entry->client);
client_signal_stop(entry->client);
}
}
/* We don't join the clients here. When we enter the final
* mirror pass, we get the IO lock, then wait for the server_fd
* to close before sending the data, to be sure that no new
* clients can be accepted which might think they've written
* to the disc. However, an existing client thread can be
* waiting for the IO lock already, so if we try to join it
* here, we deadlock.
*
* The client threads will be joined in serve_cleanup.
*
*/
}
/** Replace the current acl with a new one. The old one will be thrown
* away.
*/
void server_replace_acl(struct server *serve, struct acl *new_acl)
{
NULLCHECK(serve);
NULLCHECK(new_acl);
/* We need to lock around updates to the acl in case we try to
* destroy the old acl while checking against it.
*/
server_lock_acl(serve);
{
struct acl *old_acl = serve->acl;
serve->acl = new_acl;
/* We should always have an old_acl, but just in case... */
if (old_acl) {
acl_destroy(old_acl);
}
}
server_unlock_acl(serve);
self_pipe_signal(serve->acl_updated_signal);
}
void server_prevent_mirror_start(struct server *serve)
{
NULLCHECK(serve);
serve->mirror_can_start = 0;
}
void server_allow_mirror_start(struct server *serve)
{
NULLCHECK(serve);
serve->mirror_can_start = 1;
}
/* Only call this with the mirror start lock held */
int server_mirror_can_start(struct server *serve)
{
NULLCHECK(serve);
return serve->mirror_can_start;
}
/* Queries to see if we are currently mirroring. If we are, we need
* to communicate that via the process exit status. because otherwise
* the supervisor will assume the migration completed.
*/
int serve_shutdown_is_graceful(struct server *params)
{
int is_mirroring = 0;
server_lock_start_mirror(params);
{
if (server_is_mirroring(params)) {
is_mirroring = 1;
warn("Stop signal received while mirroring.");
server_prevent_mirror_start(params);
}
}
server_unlock_start_mirror(params);
return !is_mirroring;
}
/** Accept either an NBD or control socket connection, dispatch appropriately */
int server_accept(struct server *params)
{
NULLCHECK(params);
debug("accept loop starting");
union mysockaddr client_address;
fd_set fds;
socklen_t socklen = sizeof(client_address);
/* We select on this fd to receive OS signals (only a few of
* which we're interested in, see flexnbd.c */
int signal_fd = flexnbd_signal_fd(params->flexnbd);
int should_continue = 1;
FD_ZERO(&fds);
FD_SET(params->server_fd, &fds);
if (0 < signal_fd) {
FD_SET(signal_fd, &fds);
}
self_pipe_fd_set(params->close_signal, &fds);
self_pipe_fd_set(params->acl_updated_signal, &fds);
FATAL_IF_NEGATIVE(sock_try_select(FD_SETSIZE, &fds, NULL, NULL, NULL),
SHOW_ERRNO("select() failed")
);
if (self_pipe_fd_isset(params->close_signal, &fds)) {
server_close_clients(params);
should_continue = 0;
}
if (0 < signal_fd && FD_ISSET(signal_fd, &fds)) {
debug("Stop signal received.");
server_close_clients(params);
params->success = params->success
&& serve_shutdown_is_graceful(params);
should_continue = 0;
}
if (self_pipe_fd_isset(params->acl_updated_signal, &fds)) {
self_pipe_signal_clear(params->acl_updated_signal);
server_audit_clients(params);
}
if (FD_ISSET(params->server_fd, &fds)) {
int client_fd =
accept(params->server_fd, &client_address.generic, &socklen);
if (params->allow_new_clients) {
debug("Accepted nbd client socket fd %d", client_fd);
accept_nbd_client(params, client_fd, &client_address);
} else {
debug("New NBD client socket %d not allowed", client_fd);
sock_try_close(client_fd);
}
}
return should_continue;
}
void serve_accept_loop(struct server *params)
{
NULLCHECK(params);
while (server_accept(params));
}
void *build_allocation_map_thread(void *serve_uncast)
{
NULLCHECK(serve_uncast);
struct server *serve = (struct server *) serve_uncast;
NULLCHECK(serve->filename);
NULLCHECK(serve->allocation_map);
int fd = open(serve->filename, O_RDONLY);
FATAL_IF_NEGATIVE(fd, "Couldn't open %s", serve->filename);
if (build_allocation_map(serve->allocation_map, fd)) {
serve->allocation_map_built = 1;
} else {
/* We can operate without it, but we can't free it without a race.
* All that happens if we leave it is that it gradually builds up an
* *incomplete* record of writes. Nobody will use it, as
* allocation_map_built == 0 for the lifetime of the process.
*
* The stream functionality can still be relied on. We don't need to
* worry about mirroring waiting for the allocation map to finish,
* because we already copy every byte at least once. If that changes in
* the future, we'll need to wait for the allocation map to finish or
* fail before we can complete the migration.
*/
serve->allocation_map_not_built = 1;
warn("Didn't build allocation map for %s", serve->filename);
}
close(fd);
return NULL;
}
/** Initialisation function that sets up the initial allocation map, i.e. so
* we know which blocks of the file are allocated.
*/
void serve_init_allocation_map(struct server *params)
{
NULLCHECK(params);
NULLCHECK(params->filename);
int fd = open(params->filename, O_RDONLY);
off64_t size;
FATAL_IF_NEGATIVE(fd, "Couldn't open %s", params->filename);
size = lseek64(fd, 0, SEEK_END);
/* If discs are not in multiples of 512, then odd things happen,
* resulting in reads/writes past the ends of files.
*/
if (size != (size & ~0x1ff)) {
warn("file does not fit into 512-byte sectors; the end of the file will be ignored.");
size &= ~0x1ff;
}
params->size = size;
FATAL_IF_NEGATIVE(size, "Couldn't find size of %s", params->filename);
params->allocation_map =
bitset_alloc(params->size, block_allocation_resolution);
int ok = pthread_create(&params->allocation_map_builder_thread,
NULL,
build_allocation_map_thread,
params);
FATAL_IF_NEGATIVE(ok, "Couldn't create thread");
}
void server_forbid_new_clients(struct server *serve)
{
serve->allow_new_clients = 0;
return;
}
void server_allow_new_clients(struct server *serve)
{
serve->allow_new_clients = 1;
return;
}
void server_join_clients(struct server *serve)
{
int i;
void *status;
for (i = 0; i < serve->max_nbd_clients; i++) {
pthread_t thread_id = serve->nbd_client[i].thread;
if (thread_id != 0) {
debug("joining thread %p", thread_id);
int err = pthread_join(thread_id, &status);
if (0 == err) {
serve->nbd_client[i].thread = 0;
} else {
warn("Error %s (%i) joining thread %p", strerror(err), err,
thread_id);
}
}
}
return;
}
/* Tell the server to close all the things. */
void serve_signal_close(struct server *serve)
{
NULLCHECK(serve);
info("signalling close");
self_pipe_signal(serve->close_signal);
}
/* Block until the server closes the server_fd.
*/
void serve_wait_for_close(struct server *serve)
{
while (!fd_is_closed(serve->server_fd)) {
usleep(10000);
}
}
/* We've just had an DISCONNECT pair, so we need to shut down
* and signal our listener that we can safely take over.
*/
void server_control_arrived(struct server *serve)
{
debug("server_control_arrived");
NULLCHECK(serve);
if (!serve->success) {
serve->success = 1;
serve_signal_close(serve);
}
}
void flexnbd_stop_control(struct flexnbd *flexnbd);
/** Closes sockets, frees memory and waits for all client threads to finish */
void serve_cleanup(struct server *params,
int fatal __attribute__ ((unused)))
{
NULLCHECK(params);
void *status;
info("cleaning up");
if (params->server_fd) {
close(params->server_fd);
}
/* need to stop background build if we're killed very early on */
pthread_cancel(params->allocation_map_builder_thread);
pthread_join(params->allocation_map_builder_thread, &status);
int need_mirror_lock;
need_mirror_lock = !server_start_mirror_locked(params);
if (need_mirror_lock) {
server_lock_start_mirror(params);
}
{
if (server_is_mirroring(params)) {
server_abandon_mirror(params);
}
server_prevent_mirror_start(params);
}
if (need_mirror_lock) {
server_unlock_start_mirror(params);
}
server_join_clients(params);
if (params->allocation_map) {
bitset_free(params->allocation_map);
}
if (server_start_mirror_locked(params)) {
server_unlock_start_mirror(params);
}
if (server_acl_locked(params)) {
server_unlock_acl(params);
}
/* if( params->flexnbd ) { */
/* if ( params->flexnbd->control ) { */
/* flexnbd_stop_control( params->flexnbd ); */
/* } */
/* flexnbd_destroy( params->flexnbd ); */
/* } */
/* server_destroy( params ); */
debug("Cleanup done");
}
int server_is_in_control(struct server *serve)
{
NULLCHECK(serve);
return serve->success;
}
int server_is_mirroring(struct server *serve)
{
NULLCHECK(serve);
return ! !serve->mirror_super;
}
uint64_t server_mirror_bytes_remaining(struct server * serve)
{
if (server_is_mirroring(serve)) {
uint64_t bytes_to_xfer =
bitset_stream_queued_bytes(serve->allocation_map,
BITSET_STREAM_SET) + (serve->size -
serve->
mirror->
offset);
return bytes_to_xfer;
}
return 0;
}
/* Given historic bps measurements and number of bytes left to transfer, give
* an estimate of how many seconds are remaining before the migration is
* complete, assuming no new bytes are written.
*/
uint64_t server_mirror_eta(struct server * serve)
{
if (server_is_mirroring(serve)) {
uint64_t bytes_to_xfer = server_mirror_bytes_remaining(serve);
return bytes_to_xfer / (server_mirror_bps(serve) + 1);
}
return 0;
}
uint64_t server_mirror_bps(struct server * serve)
{
if (server_is_mirroring(serve)) {
uint64_t duration_ms =
monotonic_time_ms() - serve->mirror->migration_started;
return serve->mirror->all_dirty / ((duration_ms / 1000) + 1);
}
return 0;
}
void mirror_super_destroy(struct mirror_super *super);
/* This must only be called with the start_mirror lock held */
void server_abandon_mirror(struct server *serve)
{
NULLCHECK(serve);
if (serve->mirror_super) {
/* FIXME: AWOOGA! RACE!
* We can set abandon_signal after mirror_super has checked it, but
* before the reset. However, mirror_reset doesn't clear abandon_signal
* so it'll just terminate early on the next pass. */
ERROR_UNLESS(self_pipe_signal(serve->mirror->abandon_signal),
"Failed to signal abandon to mirror");
pthread_t tid = serve->mirror_super->thread;
pthread_join(tid, NULL);
debug("Mirror thread %p pthread_join returned", tid);
server_allow_mirror_start(serve);
mirror_super_destroy(serve->mirror_super);
serve->mirror = NULL;
serve->mirror_super = NULL;
debug("Mirror supervisor done.");
}
}
int server_default_deny(struct server *serve)
{
NULLCHECK(serve);
return acl_default_deny(serve->acl);
}
/** Full lifecycle of the server */
int do_serve(struct server *params, struct self_pipe *open_signal)
{
NULLCHECK(params);
int success;
error_set_handler((cleanup_handler *) serve_cleanup, params);
serve_open_server_socket(params);
/* Only signal that we are open for business once the server
socket is open */
if (NULL != open_signal) {
self_pipe_signal(open_signal);
}
serve_init_allocation_map(params);
serve_accept_loop(params);
success = params->success;
serve_cleanup(params, 0);
return success;
}

167
src/server/serve.h Normal file
View File

@@ -0,0 +1,167 @@
#ifndef SERVE_H
#define SERVE_H
#include <sys/types.h>
#include <unistd.h>
#include <signal.h> /* for sig_atomic_t */
#include "flexnbd.h"
#include "parse.h"
#include "acl.h"
static const int block_allocation_resolution = 4096; //128<<10;
struct client_tbl_entry {
pthread_t thread;
union mysockaddr address;
struct client *client;
};
#define MAX_NBD_CLIENTS 16
#define CLIENT_KEEPALIVE_TIME 30
#define CLIENT_KEEPALIVE_INTVL 10
#define CLIENT_KEEPALIVE_PROBES 3
struct server {
/* The flexnbd wrapper this server is attached to */
struct flexnbd *flexnbd;
/** address/port to bind to */
union mysockaddr bind_to;
/** (static) file name to serve */
char *filename;
/** TCP backlog for listen() */
int tcp_backlog;
/** (static) file name of UNIX control socket (or NULL if none) */
char *control_socket_name;
/** size of file */
uint64_t size;
/** to interrupt accept loop and clients, write() to close_signal[1] */
struct self_pipe *close_signal;
/** access control list */
struct acl *acl;
/** acl_updated_signal will be signalled after the acl struct
* has been replaced
*/
struct self_pipe *acl_updated_signal;
/* Claimed around any updates to the ACL. */
struct flexthread_mutex *l_acl;
/* Claimed around starting a mirror so that it doesn't race with
* shutting down on a SIGTERM. */
struct flexthread_mutex *l_start_mirror;
struct mirror *mirror;
struct mirror_super *mirror_super;
/* This is used to stop the mirror from starting after we
* receive a SIGTERM */
int mirror_can_start;
int server_fd;
int control_fd;
/* the allocation_map keeps track of which blocks in the backing file
* have been allocated, or part-allocated on disc, with unallocated
* blocks presumed to contain zeroes (i.e. represented as sparse files
* by the filesystem). We can use this information when receiving
* incoming writes, and avoid writing zeroes to unallocated sections
* of the file which would needlessly increase disc usage. This
* bitmap will start at all-zeroes for an empty file, and tend towards
* all-ones as the file is written to (i.e. we assume that allocated
* blocks can never become unallocated again, as is the case with ext3
* at least).
*/
struct bitset *allocation_map;
/* when starting up, this thread builds the allocation_map */
pthread_t allocation_map_builder_thread;
/* when the thread has finished, it sets this to 1 */
volatile sig_atomic_t allocation_map_built;
volatile sig_atomic_t allocation_map_not_built;
int max_nbd_clients;
struct client_tbl_entry *nbd_client;
/** Should clients use the killswitch? */
int use_killswitch;
/** If this isn't set, newly accepted clients will be closed immediately */
int allow_new_clients;
/* Marker for whether this server has control over the data in
* the file, or if we're waiting to receive it from an inbound
* migration which hasn't yet finished.
*
* It's the value which controls the exit status of a serve or
* listen process.
*/
int success;
};
struct server *server_create(struct flexnbd *flexnbd,
char *s_ip_address,
char *s_port,
char *s_file,
int default_deny,
int acl_entries,
char **s_acl_entries,
int max_nbd_clients,
int use_killswitch, int success);
void server_destroy(struct server *);
int server_is_closed(struct server *serve);
void serve_signal_close(struct server *serve);
void serve_wait_for_close(struct server *serve);
void server_replace_acl(struct server *serve, struct acl *acl);
void server_control_arrived(struct server *serve);
int server_is_in_control(struct server *serve);
int server_default_deny(struct server *serve);
int server_acl_locked(struct server *serve);
void server_lock_acl(struct server *serve);
void server_unlock_acl(struct server *serve);
void server_lock_start_mirror(struct server *serve);
void server_unlock_start_mirror(struct server *serve);
int server_is_mirroring(struct server *serve);
uint64_t server_mirror_bytes_remaining(struct server *serve);
uint64_t server_mirror_eta(struct server *serve);
uint64_t server_mirror_bps(struct server *serve);
void server_abandon_mirror(struct server *serve);
void server_prevent_mirror_start(struct server *serve);
void server_allow_mirror_start(struct server *serve);
int server_mirror_can_start(struct server *serve);
/* These three functions are used by mirror around the final pass, to close
* existing clients and prevent new ones from being around
*/
void server_forbid_new_clients(struct server *serve);
void server_close_clients(struct server *serve);
void server_join_clients(struct server *serve);
void server_allow_new_clients(struct server *serve);
/* Returns a count (ish) of the number of currently-running client threads */
int server_count_clients(struct server *params);
void server_unlink(struct server *serve);
int do_serve(struct server *, struct self_pipe *);
struct mode_readwrite_params {
union mysockaddr connect_to;
union mysockaddr connect_from;
uint64_t from;
uint32_t len;
int data_fd;
int client;
};
#endif

82
src/server/status.c Normal file
View File

@@ -0,0 +1,82 @@
#include "status.h"
#include "serve.h"
#include "util.h"
struct status *status_create(struct server *serve)
{
NULLCHECK(serve);
struct status *status;
status = xmalloc(sizeof(struct status));
status->pid = getpid();
status->size = serve->size;
status->has_control = serve->success;
status->clients_allowed = serve->allow_new_clients;
status->num_clients = server_count_clients(serve);
server_lock_start_mirror(serve);
status->is_mirroring = NULL != serve->mirror;
if (status->is_mirroring) {
status->migration_duration = monotonic_time_ms();
if ((serve->mirror->migration_started) <
status->migration_duration) {
status->migration_duration -= serve->mirror->migration_started;
} else {
status->migration_duration = 0;
}
status->migration_duration /= 1000;
status->migration_speed = server_mirror_bps(serve);
status->migration_speed_limit =
serve->mirror->max_bytes_per_second;
status->migration_seconds_left = server_mirror_eta(serve);
status->migration_bytes_left =
server_mirror_bytes_remaining(serve);
}
server_unlock_start_mirror(serve);
return status;
}
#define BOOL_S(var) (var ? "true" : "false" )
#define PRINT_BOOL( var ) \
do{dprintf( fd, #var "=%s ", BOOL_S( status->var ) );}while(0)
#define PRINT_INT( var ) \
do{dprintf( fd, #var "=%d ", status->var );}while(0)
#define PRINT_UINT64( var ) \
do{dprintf( fd, #var "=%"PRIu64" ", status->var );}while(0)
int status_write(struct status *status, int fd)
{
PRINT_INT(pid);
PRINT_UINT64(size);
PRINT_BOOL(is_mirroring);
PRINT_BOOL(clients_allowed);
PRINT_INT(num_clients);
PRINT_BOOL(has_control);
if (status->is_mirroring) {
PRINT_UINT64(migration_speed);
PRINT_UINT64(migration_duration);
PRINT_UINT64(migration_seconds_left);
PRINT_UINT64(migration_bytes_left);
if (status->migration_speed_limit < UINT64_MAX) {
PRINT_UINT64(migration_speed_limit);
};
}
dprintf(fd, "\n");
return 1;
}
void status_destroy(struct status *status)
{
NULLCHECK(status);
free(status);
}

View File

@@ -64,6 +64,8 @@
* Our current best estimate of how many seconds are left before the migration
* migration is finished.
*
* migration_bytes_left:
* The number of bytes remaining to migrate.
*/
@@ -73,29 +75,29 @@
#include <unistd.h>
struct status {
pid_t pid;
uint64_t size;
int has_control;
int clients_allowed;
int num_clients;
int is_mirroring;
pid_t pid;
uint64_t size;
int has_control;
int clients_allowed;
int num_clients;
int is_mirroring;
uint64_t migration_duration;
uint64_t migration_speed;
uint64_t migration_speed_limit;
uint64_t migration_seconds_left;
uint64_t migration_duration;
uint64_t migration_speed;
uint64_t migration_speed_limit;
uint64_t migration_seconds_left;
uint64_t migration_bytes_left;
};
/** Create a status object for the given server. */
struct status * status_create( struct server * );
struct status *status_create(struct server *);
/** Output the given status object to the given file descriptot */
int status_write( struct status *, int fd );
int status_write(struct status *, int fd);
/** Free the status object */
void status_destroy( struct status * );
void status_destroy(struct status *);
#endif

View File

@@ -1,249 +0,0 @@
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <arpa/inet.h>
#include <netinet/tcp.h>
#include <sys/un.h>
#include "sockutil.h"
#include "util.h"
size_t sockaddr_size( const struct sockaddr* sa )
{
struct sockaddr_un* un = (struct sockaddr_un*) sa;
size_t ret = 0;
switch( sa->sa_family ) {
case AF_INET:
ret = sizeof( struct sockaddr_in );
break;
case AF_INET6:
ret = sizeof( struct sockaddr_in6 );
break;
case AF_UNIX:
ret = sizeof( un->sun_family ) + SUN_LEN( un );
break;
}
return ret;
}
const char* sockaddr_address_string( const struct sockaddr* sa, char* dest, size_t len )
{
NULLCHECK( sa );
NULLCHECK( dest );
struct sockaddr_in* in = ( struct sockaddr_in* ) sa;
struct sockaddr_in6* in6 = ( struct sockaddr_in6* ) sa;
struct sockaddr_un* un = ( struct sockaddr_un* ) sa;
unsigned short real_port = ntohs( in->sin_port ); // common to in and in6
size_t size;
const char* ret = NULL;
memset( dest, 0, len );
if ( sa->sa_family == AF_INET ) {
ret = inet_ntop( AF_INET, &in->sin_addr, dest, len );
} else if ( sa->sa_family == AF_INET6 ) {
ret = inet_ntop( AF_INET6, &in6->sin6_addr, dest, len );
} else if ( sa->sa_family == AF_UNIX ) {
ret = strncpy( dest, un->sun_path, SUN_LEN( un ) );
}
if ( ret == NULL ) {
strncpy( dest, "???", len );
}
if ( NULL != ret && real_port > 0 && sa->sa_family != AF_UNIX ) {
size = strlen( dest );
snprintf( dest + size, len - size, " port %d", real_port );
}
return ret;
}
int sock_set_reuseaddr( int fd, int optval )
{
return setsockopt( fd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval) );
}
/* Set the tcp_nodelay option */
int sock_set_tcp_nodelay( int fd, int optval )
{
return setsockopt( fd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval) );
}
int sock_set_nonblock( int fd, int optval )
{
int flags = fcntl( fd, F_GETFL );
if ( flags == -1 ) {
return -1;
}
if ( optval ) {
flags = flags | O_NONBLOCK;
} else {
flags = flags & (~O_NONBLOCK);
}
return fcntl( fd, F_SETFL, flags );
}
int sock_try_bind( int fd, const struct sockaddr* sa )
{
int bind_result;
char s_address[256];
int retry = 1;
sockaddr_address_string( sa, &s_address[0], 256 );
do {
bind_result = bind( fd, sa, sockaddr_size( sa ) );
if ( 0 == bind_result ) {
info( "Bound to %s", s_address );
break;
}
else {
warn( SHOW_ERRNO( "Couldn't bind to %s", s_address ) );
switch ( errno ) {
/* bind() can give us EACCES, EADDRINUSE, EADDRNOTAVAIL, EBADF,
* EINVAL, ENOTSOCK, EFAULT, ELOOP, ENAMETOOLONG, ENOENT,
* ENOMEM, ENOTDIR, EROFS
*
* Any of these other than EADDRINUSE & EADDRNOTAVAIL signify
* that there's a logic error somewhere.
*
* EADDRINUSE is fatal: if there's something already where we
* want to be listening, we have no guarantees that any clients
* will cope with it.
*/
case EADDRNOTAVAIL:
debug( "retrying" );
sleep( 1 );
continue;
case EADDRINUSE:
warn( "%s in use, giving up.", s_address );
retry = 0;
break;
default:
warn( "giving up" );
retry = 0;
}
}
} while ( retry );
return bind_result;
}
int sock_try_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout)
{
int result;
do {
result = select(nfds, readfds, writefds, exceptfds, timeout);
if ( errno != EINTR ) {
break;
}
} while ( result == -1 );
return result;
}
int sock_try_connect( int fd, struct sockaddr* to, socklen_t addrlen, int wait )
{
fd_set fds;
struct timeval tv = { wait, 0 };
int result = 0;
if ( sock_set_nonblock( fd, 1 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to set socket non-blocking for connect()" ) );
return connect( fd, to, addrlen );
}
FD_ZERO( &fds );
FD_SET( fd, &fds );
do {
result = connect( fd, to, addrlen );
if ( result == -1 ) {
switch( errno ) {
case EINPROGRESS:
result = 0;
break; /* success */
case EAGAIN:
case EINTR:
/* Try connect() again. This only breaks out of the switch,
* not the do...while loop. since result == -1, we go again.
*/
break;
default:
warn( SHOW_ERRNO( "Failed to connect()" ) );
goto out;
}
}
} while ( result == -1 );
if ( -1 == sock_try_select( FD_SETSIZE, NULL, &fds, NULL, &tv) ) {
warn( SHOW_ERRNO( "failed to select() on non-blocking connect" ) );
result = -1;
goto out;
}
if ( !FD_ISSET( fd, &fds ) ) {
result = -1;
errno = ETIMEDOUT;
goto out;
}
int scratch;
socklen_t s_size = sizeof( scratch );
if ( getsockopt( fd, SOL_SOCKET, SO_ERROR, &scratch, &s_size ) == -1 ) {
result = -1;
warn( SHOW_ERRNO( "getsockopt() failed" ) );
goto out;
}
if ( scratch == EINPROGRESS ) {
scratch = ETIMEDOUT;
}
result = scratch ? -1 : 0;
errno = scratch;
out:
if ( sock_set_nonblock( fd, 0 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to make socket blocking after connect()" ) );
return -1;
}
debug( "sock_try_connect: %i", result );
return result;
}
int sock_try_close( int fd )
{
int result;
do {
result = close( fd );
if ( result == -1 ) {
if ( EINTR == errno ) {
continue; /* retry EINTR */
} else {
warn( SHOW_ERRNO( "Failed to close() fd %i", fd ) );
break; /* Other errors get reported */
}
}
} while( 0 );
return result;
}

View File

@@ -1,41 +0,0 @@
#ifndef SOCKUTIL_H
#define SOCKUTIL_H
#include <sys/time.h>
#include <sys/socket.h>
#include <sys/select.h>
/* Returns the size of the sockaddr, or 0 on error */
size_t sockaddr_size(const struct sockaddr* sa);
/* Convert a sockaddr into an address. Like inet_ntop, it returns dest if
* successful, NULL otherwise. In the latter case, dest will contain "???"
*/
const char* sockaddr_address_string(const struct sockaddr* sa, char* dest, size_t len);
/* Set the SOL_REUSEADDR otion */
int sock_set_reuseaddr(int fd, int optval);
/* Set the tcp_nodelay option */
int sock_set_tcp_nodelay(int fd, int optval);
/* TODO: Set the tcp_cork option */
// int sock_set_cork(int fd, int optval);
int sock_set_nonblock(int fd, int optval);
/* Attempt to bind the fd to the sockaddr, retrying common transient failures */
int sock_try_bind(int fd, const struct sockaddr* sa);
/* Try to call select(), retrying EINTR */
int sock_try_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
/* Try to call connect(), timing out after wait seconds */
int sock_try_connect( int fd, struct sockaddr* to, socklen_t addrlen, int wait );
/* Try to call close(), retrying EINTR */
int sock_try_close( int fd );
#endif

View File

@@ -1,78 +0,0 @@
#include "status.h"
#include "serve.h"
#include "util.h"
struct status * status_create( struct server * serve )
{
NULLCHECK( serve );
struct status * status;
status = xmalloc( sizeof( struct status ) );
status->pid = getpid();
status->size = serve->size;
status->has_control = serve->success;
status->clients_allowed = serve->allow_new_clients;
status->num_clients = server_count_clients( serve );
server_lock_start_mirror( serve );
status->is_mirroring = NULL != serve->mirror;
if ( status->is_mirroring ) {
status->migration_duration = monotonic_time_ms();
if ( ( serve->mirror->migration_started ) < status->migration_duration ) {
status->migration_duration -= serve->mirror->migration_started;
} else {
status->migration_duration = 0;
}
status->migration_duration /= 1000;
status->migration_speed = serve->mirror->all_dirty / ( status->migration_duration + 1 );
status->migration_speed_limit = serve->mirror->max_bytes_per_second;
status->migration_seconds_left = server_mirror_eta( serve );
}
server_unlock_start_mirror( serve );
return status;
}
#define BOOL_S(var) (var ? "true" : "false" )
#define PRINT_BOOL( var ) \
do{dprintf( fd, #var "=%s ", BOOL_S( status->var ) );}while(0)
#define PRINT_INT( var ) \
do{dprintf( fd, #var "=%d ", status->var );}while(0)
#define PRINT_UINT64( var ) \
do{dprintf( fd, #var "=%"PRIu64" ", status->var );}while(0)
int status_write( struct status * status, int fd )
{
PRINT_INT( pid );
PRINT_UINT64( size );
PRINT_BOOL( is_mirroring );
PRINT_BOOL( clients_allowed );
PRINT_INT( num_clients );
PRINT_BOOL( has_control );
if ( status->is_mirroring ) {
PRINT_UINT64( migration_speed );
PRINT_UINT64( migration_duration );
PRINT_UINT64( migration_seconds_left );
if ( status->migration_speed_limit < UINT64_MAX ) {
PRINT_UINT64( migration_speed_limit );
};
}
dprintf(fd, "\n");
return 1;
}
void status_destroy( struct status * status )
{
NULLCHECK( status );
free( status );
}

View File

@@ -1,87 +0,0 @@
#include <stdarg.h>
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <malloc.h>
#include <unistd.h>
#include <time.h>
#include "util.h"
pthread_key_t cleanup_handler_key;
int log_level = 2;
void error_init(void)
{
pthread_key_create(&cleanup_handler_key, free);
}
void error_handler(int fatal)
{
DECLARE_ERROR_CONTEXT(context);
if (context) {
longjmp(context->jmp, fatal ? 1 : 2 );
}
else {
if ( fatal ) { abort(); }
else { pthread_exit((void*) 1); }
}
}
void exit_err( const char *msg )
{
fprintf( stderr, "%s\n", msg );
exit( 1 );
}
void mylog(int line_level, const char* format, ...)
{
if (line_level < log_level) { return; }
va_list argptr;
va_start(argptr, format);
vfprintf(stderr, format, argptr);
va_end(argptr);
}
uint64_t monotonic_time_ms()
{
struct timespec ts;
uint64_t seconds_ms, nanoseconds_ms;
FATAL_IF_NEGATIVE(
clock_gettime(CLOCK_MONOTONIC, &ts),
SHOW_ERRNO( "clock_gettime failed" )
);
seconds_ms = ts.tv_sec;
seconds_ms = seconds_ms * 1000;
nanoseconds_ms = ts.tv_nsec;
nanoseconds_ms = nanoseconds_ms / 1000000;
return seconds_ms + nanoseconds_ms;
}
void* xrealloc(void* ptr, size_t size)
{
void* p = realloc(ptr, size);
FATAL_IF_NULL(p, "couldn't xrealloc %d bytes", ptr ? "realloc" : "malloc", size);
return p;
}
void* xmalloc(size_t size)
{
void* p = xrealloc(NULL, size);
memset(p, 0, size);
return p;
}

View File

@@ -1,33 +1,41 @@
# encoding: utf-8
require 'flexnbd'
require 'file_writer'
class Environment
attr_reader( :blocksize, :filename1, :filename2, :ip,
:port1, :port2, :nbd1, :nbd2, :file1, :file2 )
attr_reader(:blocksize, :filename1, :filename2, :ip,
:port1, :port2, :nbd1, :nbd2, :file1, :file2)
def initialize
@blocksize = 1024
@filename1 = "/tmp/.flexnbd.test.#{$$}.#{Time.now.to_i}.1"
@filename2 = "/tmp/.flexnbd.test.#{$$}.#{Time.now.to_i}.2"
@ip = "127.0.0.1"
@available_ports = [*40000..41000] - listening_ports
@filename1 = "/tmp/.flexnbd.test.#{$PROCESS_ID}.#{Time.now.to_i}.1"
@filename2 = "/tmp/.flexnbd.test.#{$PROCESS_ID}.#{Time.now.to_i}.2"
@ip = '127.0.0.1'
@available_ports = [*40_000..41_000] - listening_ports
@port1 = @available_ports.shift
@port2 = @available_ports.shift
@nbd1 = FlexNBD::FlexNBD.new("../../build/flexnbd", @ip, @port1)
@nbd2 = FlexNBD::FlexNBD.new("../../build/flexnbd", @ip, @port2)
@nbd1 = FlexNBD::FlexNBD.new('../../build/flexnbd', @ip, @port1)
@nbd2 = FlexNBD::FlexNBD.new('../../build/flexnbd', @ip, @port2)
@fake_pid = nil
end
def proxy1(port=@port2)
@nbd1.proxy(@ip, port)
end
def proxy2(port=@port1)
@nbd2.proxy(@ip, port)
def blocksize=(b)
raise RuntimeError, "Unable to change blocksize after files have been opened" if @file1 or @file2
@blocksize = b
end
def prefetch_proxy!
@nbd1.prefetch_proxy = true
@nbd2.prefetch_proxy = true
end
def proxy1(port = @port2)
@nbd1.proxy(@ip, port)
end
def proxy2(port = @port1)
@nbd2.proxy(@ip, port)
end
def serve1(*acl)
@nbd1.serve(@filename1, *acl)
@@ -37,29 +45,26 @@ class Environment
@nbd2.serve(@filename2, *acl)
end
def listen1( *acl )
@nbd1.listen( @filename1, *(acl.empty? ? @acl1: acl) )
def listen1(*acl)
@nbd1.listen(@filename1, *(acl.empty? ? @acl1 : acl))
end
def listen2( *acl )
@nbd2.listen( @filename2, *acl )
def listen2(*acl)
@nbd2.listen(@filename2, *acl)
end
def break1
@nbd1.break
end
def acl1( *acl )
@nbd1.acl( *acl )
def acl1(*acl)
@nbd1.acl(*acl)
end
def acl2( *acl )
@nbd2.acl( *acl )
def acl2(*acl)
@nbd2.acl(*acl)
end
def status1
@nbd1.status.first
end
@@ -68,23 +73,20 @@ class Environment
@nbd2.status.first
end
def mirror12
@nbd1.mirror( @nbd2.ip, @nbd2.port )
@nbd1.mirror(@nbd2.ip, @nbd2.port)
end
def mirror12_unchecked
@nbd1.mirror_unchecked( @nbd2.ip, @nbd2.port, nil, nil, 10 )
@nbd1.mirror_unchecked(@nbd2.ip, @nbd2.port, nil, nil, 10)
end
def mirror12_unlink
@nbd1.mirror_unlink( @nbd2.ip, @nbd2.port, 2 )
@nbd1.mirror_unlink(@nbd2.ip, @nbd2.port, 2)
end
def write1( data )
@nbd1.write( 0, data )
def write1(data)
@nbd1.write(0, data)
end
def writefile1(data)
@@ -95,63 +97,54 @@ class Environment
@file2 = FileWriter.new(@filename2, @blocksize).write(data)
end
def truncate1( size )
def truncate1(size)
system "truncate -s #{size} #{@filename1}"
end
def listening_ports
`netstat -ltn`.
split("\n").
map { |x| x.split(/\s+/) }[2..-1].
map { |l| l[3].split(":")[-1].to_i }
`netstat -ltn`
.split("\n")
.map { |x| x.split(/\s+/) }[2..-1]
.map { |l| l[3].split(':')[-1].to_i }
end
def cleanup
if @fake_pid
begin
Process.waitpid2( @fake_pid )
Process.waitpid2(@fake_pid)
rescue Errno::ESRCH
end
end
@nbd1.can_die(0)
@nbd1.kill
@nbd2.kill
[@filename1, @filename2].each do |f|
File.unlink(f) if File.exists?(f)
File.unlink(f) if File.exist?(f)
end
end
def run_fake( name, addr, port, sock=nil )
fakedir = File.join( File.dirname( __FILE__ ), "fakes" )
fakeglob = File.join( fakedir, name ) + "*"
fake = Dir[fakeglob].sort.find { |fn|
File.executable?( fn )
}
def run_fake(name, addr, port, sock = nil)
fakedir = File.join(File.dirname(__FILE__), 'fakes')
fakeglob = File.join(fakedir, name) + '*'
fake = Dir[fakeglob].sort.find do |fn|
File.executable?(fn)
end
raise "no fake executable at #{fakeglob}" unless fake
raise "no addr" unless addr
raise "no port" unless port
raise 'no addr' unless addr
raise 'no port' unless port
@fake_pid = fork do
exec [fake, addr, port, @nbd1.pid, sock].map{|x| x.to_s}.join(" ")
exec [fake, addr, port, @nbd1.pid, sock].map(&:to_s).join(' ')
end
sleep(0.5)
end
def fake_reports_success
_,status = Process.waitpid2( @fake_pid )
_, status = Process.waitpid2(@fake_pid)
@fake_pid = nil
status.success?
end
end # class Environment

View File

@@ -1,6 +1,4 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Open a server, accept a client, then cancel the migration by issuing
# a break command.
@@ -8,28 +6,27 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port, src_pid, sock = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
client = server.accept
ctrl = UNIXSocket.open( sock )
ctrl = UNIXSocket.open(sock)
Process.kill("STOP", src_pid.to_i)
ctrl.write( "break\n" )
Process.kill('STOP', src_pid.to_i)
ctrl.write("break\n")
ctrl.close_write
client.write_hello
Process.kill("CONT", src_pid.to_i)
Process.kill('CONT', src_pid.to_i)
fail "Unexpected control response" unless
raise 'Unexpected control response' unless
ctrl.read =~ /0: mirror stopped/
client2 = nil
begin
client2 = server.accept( "Expected timeout" )
fail "Unexpected reconnection"
client2 = server.accept('Expected timeout')
raise 'Unexpected reconnection'
rescue Timeout::Error
# expected
end
client.close
exit(0)

View File

@@ -1,6 +1,4 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Receive a mirror, and disconnect after sending the entrust reply but
# before it can send the disconnect signal.
#
@@ -11,26 +9,25 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port, src_pid = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
client = server.accept
client.write_hello
while (req = client.read_request; req[:type] == 1)
client.read_data( req[:len] )
client.write_reply( req[:handle] )
while req = client.read_request; req[:type] == 1
client.read_data(req[:len])
client.write_reply(req[:handle])
end
system "kill -STOP #{src_pid}"
client.write_reply( req[:handle] )
client.write_reply(req[:handle])
client.close
system "kill -CONT #{src_pid}"
sleep( 0.25 )
client2 = server.accept( "Timed out waiting for a reconnection" )
sleep(0.25)
client2 = server.accept('Timed out waiting for a reconnection')
client2.close
server.close
$stderr.puts "done"
warn 'done'
exit(0)

View File

@@ -10,12 +10,12 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
client = server.accept( "Timed out waiting for a connection" )
server = FakeDest.new(addr, port)
client = server.accept('Timed out waiting for a connection')
client.write_hello
client.close
new_client = server.accept( "Timed out waiting for a reconnection" )
new_client = server.accept('Timed out waiting for a reconnection')
new_client.close
server.close

View File

@@ -11,13 +11,13 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
client = server.accept( "Timed out waiting for a connection" )
server = FakeDest.new(addr, port)
client = server.accept('Timed out waiting for a connection')
client.write_hello
client.read_request
client.close
new_client = server.accept( "Timed out waiting for a reconnection" )
new_client = server.accept('Timed out waiting for a reconnection')
new_client.close
server.close

View File

@@ -1,6 +1,4 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Open a server, accept a client, then we expect a single write
# followed by an entrust. However, we disconnect after the write so
# the entrust will fail. We don't expect a reconnection: the sender
@@ -10,16 +8,16 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port, src_pid = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
client = server.accept
client.write_hello
req = client.read_request
data = client.read_data( req[:len] )
data = client.read_data(req[:len])
Process.kill("STOP", src_pid.to_i)
client.write_reply( req[:handle], 0 )
Process.kill('STOP', src_pid.to_i)
client.write_reply(req[:handle], 0)
client.close
Process.kill("CONT", src_pid.to_i)
Process.kill('CONT', src_pid.to_i)
exit(0)

View File

@@ -1,19 +1,16 @@
#!/usr/bin/env ruby
# encoding: utf-8
require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
client = server.accept
client.write_hello
handle = client.read_request[:handle]
client.write_error( handle )
client.write_error(handle)
client2 = server.accept( "Timed out waiting for a reconnection" )
client2 = server.accept('Timed out waiting for a reconnection')
client.close
client2.close

View File

@@ -14,8 +14,8 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
client = server.accept( "Client didn't make a connection" )
server = FakeDest.new(addr, port)
client = server.accept("Client didn't make a connection")
# Sleep for one second past the timeout (a bit of slop in case ruby
# doesn't launch things quickly)
@@ -26,10 +26,10 @@ client.close
# Invert the sense of the timeout exception, since we *don't* want a
# connection attempt
begin
server.accept( "Expected timeout" )
fail "Unexpected reconnection"
server.accept('Expected timeout')
raise 'Unexpected reconnection'
rescue Timeout::Error
# expected
# expected
end
server.close

View File

@@ -1,6 +1,4 @@
#!/usr/bin/env ruby
# encoding: utf-8
# Open a socket, say hello, receive a write, then sleep for >
# MS_REQUEST_LIMIT_SECS seconds. This should tell the source that the
# write has gone MIA, and we expect a reconnect.
@@ -9,18 +7,24 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
client1 = server.accept( server )
server = FakeDest.new(addr, port)
client1 = server.accept(server)
client1.write_hello
client1.read_request
t = Thread.start do
client2 = server.accept( "Timed out waiting for a reconnection",
FlexNBD::MS_REQUEST_LIMIT_SECS + 2 )
client2 = server.accept('Timed out waiting for a reconnection',
FlexNBD::MS_REQUEST_LIMIT_SECS + 2)
client2.close
end
sleep( FlexNBD::MS_REQUEST_LIMIT_SECS + 2 )
sleep_time = if ENV.key?('FLEXNBD_MS_REQUEST_LIMIT_SECS')
ENV['FLEXNBD_MS_REQUEST_LIMIT_SECS'].to_f
else
FlexNBD::MS_REQUEST_LIMIT_SECS
end
sleep(sleep_time + 2.0)
client1.close
t.join

View File

@@ -7,21 +7,21 @@ include FlexNBD
Thread.abort_on_exception
addr, port = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
client1 = server.accept
# We don't expect a reconnection attempt.
t = Thread.new do
begin
client2 = server.accept( "Timed out waiting for a reconnection",
FlexNBD::MS_RETRY_DELAY_SECS + 1 )
fail "Unexpected reconnection"
client2 = server.accept('Timed out waiting for a reconnection',
FlexNBD::MS_RETRY_DELAY_SECS + 1)
raise 'Unexpected reconnection'
rescue Timeout::Error
#expected
# expected
end
end
client1.write_hello( :magic => :wrong )
client1.write_hello(magic: :wrong)
t.join

View File

@@ -9,7 +9,7 @@ include FlexNBD
Thread.abort_on_exception = true
addr, port = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
client = server.accept
t = Thread.new do
@@ -18,21 +18,21 @@ t = Thread.new do
# so it makes no sense to continue. This means we have to invert the
# sense of the exception.
begin
client2 = server.accept( "Timed out waiting for a reconnection",
FlexNBD::MS_RETRY_DELAY_SECS + 1 )
client2 = server.accept('Timed out waiting for a reconnection',
FlexNBD::MS_RETRY_DELAY_SECS + 1)
client2.close
fail "Unexpected reconnection."
raise 'Unexpected reconnection.'
rescue Timeout::Error
end
end
client.write_hello( :size => :wrong )
client.write_hello(size: :wrong)
t.join
# Now check that the source closed the first socket (yes, this was an
# actual bug)
fail "Didn't close socket" unless client.disconnected?
raise "Didn't close socket" unless client.disconnected?
exit 0

View File

@@ -7,18 +7,16 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port = *ARGV
server = FakeDest.new( addr, port )
server = FakeDest.new(addr, port)
server.accept.close
begin
server.accept
fail "Unexpected reconnection"
raise 'Unexpected reconnection'
rescue Timeout::Error
# expected
end
server.close
exit(0)

View File

@@ -8,8 +8,8 @@ require 'flexnbd/fake_dest'
include FlexNBD
addr, port, pid = *ARGV
server = FakeDest.new( addr, port )
client = server.accept( "Timed out waiting for a connection" )
server = FakeDest.new(addr, port)
client = server.accept('Timed out waiting for a connection')
client.write_hello
Process.kill(15, pid.to_i)

Some files were not shown because too many files have changed in this diff Show More