79 Commits

Author SHA1 Message Date
Patrick J Cherry
4cd7e764bb Updated changelog 2016-10-04 21:22:07 +01:00
Patrick J Cherry
4f535fbb02 Merge branch 'master' of gitlab.bytemark.co.uk:open-source/flexnbd-c into debian 2016-10-04 21:14:26 +01:00
James Carter
218c55fb63 Merge branch 'simplify-nbd-handles-part-deux' into 'master'
Simplified NBD handle comparisons

8 bytes, therefore a uing64_t to compare to, no need for memcmp()

Signed-off-by: Michel Pollet <buserror@gmail.com>

See merge request !5
2016-10-04 15:49:07 +01:00
Michel Pollet
956a602475 Simplified NBD handle comparisons
8 bytes, therefore a uing64_t to compare to, no need for memcmp()

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-10-04 15:41:48 +01:00
James Carter
26a0a82f9d Merge branch '12-fix-bind' into 'master'
Attempt at fixing bind() bug

This will prevent the bind() wrapper to loop forever in some cases. I
could nor reproduce the issue, but this removes the only infinite loop I
could find.

Closes #12

See merge request !3
2016-10-04 15:41:37 +01:00
Michel Pollet
76e0476113 Attempt at fixing bind() bug
This will prevent the bind() wrapper to loop forever in some cases. I
could nor reproduc the issue, but this removes the only infinite loop I
could find.

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-10-04 15:36:46 +01:00
James Carter
e3360a3a1b Merge branch 'cherry-pick-41f25408' into 'master'
Close socket fix, might relate to migration crashing

This was listed as a bug, and was immediatelly picked the static
analyzer anyway, this is very likely the cause for the
migration-cancel-crash bug.

closes #10 and possibly closes #11

See merge request !1
2016-09-14 11:29:12 +01:00
Michel Pollet
1fefe1a669 Close socket fix, might relate to migration crashing
This was listed as a bug, and was immediatelly picked the static
analyzer anyway, this is very likely the cause for the
migration-cancel-crash bug.

Signed-off-by: Michel Pollet <buserror@gmail.com>
2016-09-14 10:45:49 +01:00
Patrick J Cherry
4ed8d49b2c Updated rules to skip ruby tests, and just use the normal make check 2016-08-31 10:06:07 +01:00
Patrick J Cherry
3af0e84f5f Updated Debian packaging to be in a separate branch.
This should allow us to use git-buildpackage to build our packages.
2016-08-30 21:57:00 +01:00
Patrick J Cherry
ba14943b60 Removed old changelog.template 2016-08-30 21:49:54 +01:00
Patrick J Cherry
4a709e73f8 Moved .hgignore to .gitignore 2016-08-30 21:47:25 +01:00
Patrick J Cherry
91a8946ddc Removed debian directory 2016-08-30 21:46:59 +01:00
nick
20f99b4554 flexnbd: We only require 1/8th of the memory we allocate for bitsets (bits vs. bytes confusion) 2015-05-13 09:25:09 +01:00
nick
c363991cfd Makefile: Add -lm to LLDFLAGS 2015-04-01 12:39:07 +01:00
Alex Young
c41eeff2fc Moved the server-specific files into src/server 2014-03-11 11:05:43 +00:00
Alex Young
5960e4d10b Remove the proxy's dependency on flexnbd.h 2014-03-11 10:37:00 +00:00
Alex Young
f0911b5c6c Tighten up some variable scopes. 2014-03-11 10:24:29 +00:00
Alex Young
b063f41ba8 Avoid a potential null pointer dereference 2014-03-11 09:57:19 +00:00
Alex Young
28c7e43e45 Fix a harmless buffer overflow 2014-03-11 09:49:25 +00:00
Alex Young
9326b6b882 Merge 2014-02-27 16:18:17 +00:00
Alex Young
f93476ebd3 Replace off64_t with uint64_t where it makes sense to do so.
It looks like off64_t was propagated through the code from the return
type of lseek64(), which isn't appropriate in many of the places we're
using it.
2014-02-27 16:04:25 +00:00
Alex Young
666b60ae1c Allow subset reads in prefetch_contains and prefetch_offset 2014-02-27 14:54:18 +00:00
nick
f48bf2b296 Automated merge with ssh://dev/flexnbd-c 2014-02-27 14:33:01 +00:00
nick
705164ae3b Cork/uncork in mirror - socket_connect already sets nodelay 2014-02-27 14:32:54 +00:00
nick
dbe7053bf3 Avoid some false positives 2014-02-27 14:32:26 +00:00
Alex Young
fa8023cf69 Proxy prefetch cache becomes a command-line argument. 2014-02-27 14:21:36 +00:00
nick
aba802d415 bitset: Allocate the right amount of memory
We were calculating the wrong number of words per byte in the first
place, and then passing the number of *words* to malloc, which expects
the number of *bytes*.

Fix both errors
2014-02-27 12:57:09 +00:00
Alex Young
d146102c2c Cherry-pick extra toolchain Makefile options 2014-02-26 15:56:41 +00:00
Alex Young
5551373073 Merge 2014-02-26 15:37:44 +00:00
Alex Young
77f333423b Apply Michel's tidy-ups 2014-02-26 15:19:03 +00:00
Alex Young
ffa45879d7 Pull back the changelog generation to the simplest thing that can possibly work 2014-02-25 17:24:25 +00:00
Alex Young
2fa1ce8e6b Tweak changelog generation not to skip commits since last tag 2014-02-25 16:35:51 +00:00
nick
6f540ce238 proxy: Turn on TCP_CORK
Now that we're using NODELAY, we should definitely use cork around
writes to the upstream server. This prevents each partial write()
from being its own packet, which would be terrible if it actually
happened with any regularity (we'd mostly see it when the kernel
is stressed, and write() is progressing a few bytes at a time as
a result)
2014-02-25 16:00:48 +00:00
nick
f9a3447bc9 proxy: Turn on TCP_NODELAY for the proxy->upstream leg
Nagle doesn't actually affect us too badly here, as we don't write
the header and then the data in two separate calls under normal
circumstances, which is the pathological case, but we should have
NODELAY on, regardless
2014-02-25 15:59:05 +00:00
nick
7806ec11ee client: cork/uncork around NBD_REQUEST_READ responses
We don't cork/uncork around NBD_REQUEST_WRITE responses because
they're only 16 bytes, and we're using blocking writes.
2014-02-25 15:45:41 +00:00
nick
1817c13acb sockutil: Add a tcp_cork helper 2014-02-25 15:44:46 +00:00
nick
97c8d7a358 Remove a compile-time optional selection of O_DIRECT (was never used)
The mmap() manpage tells us to avoid using O_DIRECT with mmap() - so
do so.
2014-02-24 13:47:29 +00:00
Alex Young
8cf92af900 Call srand() to make sure request handles are properly randomised 2014-02-24 12:20:50 +00:00
Alex Young
5185be39c9 Merge 2014-02-24 11:25:46 +00:00
Alex Young
374b4c616e Remove unreachable code to make -Wunreachable-code on clang useful. 2014-02-24 11:23:09 +00:00
Alex Young
50ec8fb7cc Depend on either libev4 or libev3, whichever is available 2014-02-24 11:22:26 +00:00
Alex Young
5fc9ad6fd8 Add some build-depends which make doc needs 2014-02-21 21:40:55 +00:00
Alex Young
85c463c4bd Add asciidoc as a Build-Depends 2014-02-21 20:46:44 +00:00
Alex Young
278a3151a8 Update Rakefile to generate debian/changelog.
`rake changelog` and a commit should be run after each `hg tag`.
2014-02-21 19:58:02 +00:00
Alex Young
0ea66b1e04 Added tag 0.1.1 for changeset 303f6859295d 2014-02-21 19:54:25 +00:00
Alex Young
83e3d65be9 Update the Makefile to work with dpkg-buildpackage 2014-02-21 19:39:27 +00:00
Alex Young
4f31bd9340 Switch from a rake-based build to a make-based build.
This commit beefs up the Makefile to do the build, instead of the
Rakefile.

It also removes from the Rakefile the dependency on rake_utils, which
should mean it's ok to build in a schroot.

The files are reorganised to make the Makefile rules more tractable,
although the reorganisation reveals a problem with our current code
organisation.

The problem is that the proxy-specific code transitively depends on the
server code via flexnbd.h, which has a circular dependency on the server
and client structs. This should be broken in a future commit by
separating the flexnbd struct into a shared config struct and
server-specific parts, so that the server code can be moved into
src/server to more accurately show the functional dependencies.
2014-02-21 19:10:55 +00:00
nick
0baf93fd7b proxy: Fix a read corruption issue caused by us failing to reset needles on timeout 2014-02-11 20:43:44 +00:00
nick
175f19b3e7 client: Add a cork TODO pair 2014-02-11 15:22:54 +00:00
nick
8d56316548 client: Start checking for exceptions on the client socket 2014-02-11 14:32:12 +00:00
nick
27f2cc7083 Some debug and whitespace tweaks 2014-02-11 14:31:58 +00:00
nick
8084a41ad2 flexnbd client: Catch a few cases where the killswitch wasn't disarmed 2014-01-28 11:45:27 +00:00
nick
5ca5858929 Increase a timeout on a test to handle slow unlink calls on other filesystems 2014-01-22 12:21:49 +00:00
nick
afcc07a181 Fix stop signal logic broken by the killswitch 2014-01-22 12:16:09 +00:00
nick
dcead04cf6 Fix up the check_util test once more 2014-01-22 12:10:34 +00:00
nick
4f7f5f1745 Fix a few dangling bits in client.h 2014-01-22 12:01:42 +00:00
nick
976e9ba07f Automated merge with ssh://dev.bytemark.co.uk//repos/flexnbd-c 2014-01-22 11:49:26 +00:00
nick
91d9531a60 flexnbd serve: Make the killswitch per-client-thread
This is a bit tricky, but calling shutdown() on a socket in a signal
handler is safe, and (at least in linux) appears to cause any read()
or write() calls blocked on that socket to return, even with SA_RESTART.

I'm not confident enough about the rest of flexnbd's syscall error
handling to turn SA_RESTART off for this signal...
2014-01-22 11:49:21 +00:00
nick
905d66af77 Rework a test 2014-01-22 11:45:35 +00:00
nick
eee7c9644c Another fedora build fix 2014-01-22 11:42:00 +00:00
nick
ce5c51cdcf Fix a test case 2014-01-22 11:40:19 +00:00
nick
c6c53c63ba Fix compilation on fedora 2014-01-22 10:39:29 +00:00
Tristan Heaven
20bd58749e Fix help_text errors for break and status modes 2013-11-07 16:45:04 +00:00
nick
866bf835e6 tests: Fix an uninitialized memory access 2013-10-30 22:46:49 +00:00
nick
53cbe14556 mirror: lengthen the request timeout to 60 seconds
This is complicated slightly by a need to keep the tests fast, so
we introduce an environment variable that can override the constant
2013-10-30 22:45:12 +00:00
nick
cd3281f62d acl: Make some compilers happy 2013-10-30 22:44:15 +00:00
nick
1e5457fed0 mirror: Couple of tiny cleanups 2013-10-30 22:04:41 +00:00
nick
0753369b77 mirror: Turn off the 'begin' timer before continuing 2013-10-30 20:25:50 +00:00
nick
9d9ae40953 Increase loglevel of some allocation map messages 2013-10-30 16:40:32 +00:00
nick
65d4f581b9 mirror: Clean up bps calculation slightly 2013-10-24 15:11:55 +01:00
nick
77c71ccf09 mirror: Ensure the bitset is actually disabled on mirror error 2013-10-23 16:18:00 +01:00
nick
97a923afdf mirror: Don't start migrating until the allocation map is built
There is a fun race that can happen if we begin migrating while the
allocation map is still building. We call bitset_enable_stream()
when the migration begins, which causes the builder to start putting
events into the stream. This is bad all by itself, as it slows the
migration down for no reason, but the stream is a limited-size queue
and there are situations (migration fails and is restarted) where we
can end up with the queue full and nobody able to empty it, freezing
the whole thing.
2013-10-23 15:58:47 +01:00
nick
335261869d mirror: Don't count bytes transferred for the purposes of keeping the stream empty as part of our bwlimit
This prevents a fairly nasty situation occurring where the rate of change on the disc is high enough that
just servicing it generates enough traffic to keep us over the bwlimit threshold indefinitely. That would
cause us to sleep during the only windows we'd ordinarily have to advance the offset.
2013-10-23 15:26:28 +01:00
nick
8cf9cae8c0 mirror: Don't sleep if our stream is filling up 2013-10-23 14:38:27 +01:00
nick
6986c70888 bitset: Swap pthread_cond_broadcast for pthread_cond_signal
Normally we'll only have one thread waiting anyway, but there's no
point activating a race here in the cases where we have > 1 waiting,
so signal is what we want.
2013-09-24 15:28:58 +01:00
nick
4b9ded0e1d bitset: More-efficient implementation of bitset_stream_queued_bytes
Rather than iterating the entire queue every time this function is
called, we instead take a small hit on enqueue and dequeue to keep
a running byte total keyed by event type that we can return.
2013-09-24 15:27:17 +01:00
nick
b177faacd6 mirror: Reduce the mirror convergence window to 5 seonds, from 60
Also remove some obsolete constants
2013-09-24 14:42:21 +01:00
nick
96e60a4a29 Added tag 0.1.0 for changeset acad9e9df53c 2013-09-24 12:27:29 +01:00
70 changed files with 3783 additions and 936 deletions

9
.gitignore vendored Normal file
View File

@@ -0,0 +1,9 @@
**/*.o
**/*~
flexnbd
build/
pkg/
**/*.orig
**/.*.swp
cscope.out
valgrind.out

View File

@@ -1,9 +0,0 @@
.o$
~$
^flexnbd$
^build/
^pkg/
\.orig$
.*\.swp$
cscope.out$
valgrind.out$

134
Makefile
View File

@@ -1,10 +1,134 @@
#!/usr/bin/make -f #!/usr/bin/make -f
all: VPATH=src:tests/unit
rake build DESTDIR?=/
PREFIX?=/usr/local/bin
INSTALLDIR=$(DESTDIR)/$(PREFIX)
all-debug: ifdef DEBUG
DEBUG=1 rake build CFLAGS_EXTRA=-g -DDEBUG
LDFLAGS_EXTRA=-g
else
CFLAGS_EXTRA=-O2
endif
CFLAGS_EXTRA += -fPIC --std=gnu99
LDFLAGS_EXTRA += -Wl,--relax,--gc-sections
TOOLCHAIN := $(shell $(CC) --version|awk '/Debian/ {print "debian";exit;}')
#
# This bit adds extra flags depending of the distro, and the
# architecture. To make sure debian packages have the right
# set of 'native' flags on them
#
ifeq ($(TOOLCHAIN),debian)
DEBARCH := $(shell dpkg-architecture -qDEB_BUILD_ARCH)
ifeq ($(DEBARCH),$(filter $(DEBARCH),amd64 i386))
CFLAGS_EXTRA += -march=native
endif
ifeq ($(DEBARCH),armhf)
CFLAGS_EXTRA += -march=armv7-a -mtune=cortex-a8 -mfpu=neon
endif
LDFLAGS_EXTRA += -L$(LIB) -Wl,-rpath,${shell readlink -f ${LIB}}
else
LDFLAGS_EXTRA += -L$(LIB) -Wl,-rpath-link,$(LIB)
endif
# The -Wunreachable-code warning is only implemented in clang, but it
# doesn't break anything for gcc to see it.
WARNINGS=-Wall \
-Wextra \
-Werror-implicit-function-declaration \
-Wstrict-prototypes \
-Wno-missing-field-initializers \
-Wunreachable-code
CCFLAGS=-D_GNU_SOURCE=1 $(WARNINGS) $(CFLAGS_EXTRA) $(CFLAGS)
LLDFLAGS=-lm -lrt -lev $(LDFLAGS_EXTRA) $(LDFLAGS)
CC?=gcc
LIBS=-lpthread
INC=-I/usr/include/libev -Isrc/common -Isrc/server -Isrc/proxy
COMPILE=$(CC) $(INC) -c $(CCFLAGS)
SAVEDEP=$(CC) $(INC) -MM $(CCFLAGS)
LINK=$(CC) $(LLDFLAGS) -Isrc $(LIBS)
LIB=build/
EXISTING_OBJS := $(wildcard build/*.o)
-include $(EXISTING_OBJS:.o=.d)
COMMON_SRC := $(wildcard src/common/*.c)
SERVER_SRC := $(wildcard src/server/*.c)
PROXY_SRC := $(wildcard src/proxy/*.c)
COMMON_OBJ := $(COMMON_SRC:src/%.c=build/%.o)
SERVER_OBJ := $(SERVER_SRC:src/%.c=build/%.o)
PROXY_OBJ := $(PROXY_SRC:src/%.c=build/%.o)
SRCS := $(COMMON_SRC) $(SERVER_SRC) $(PROXY_SRC)
OBJS := $(COMMON_OBJ) $(SERVER_OBJ) $(PROXY_OBJ)
all: build/flexnbd build/flexnbd-proxy doc
build/%.o: %.c
mkdir -p $(dir $@)
$(COMPILE) $< -o $@
$(SAVEDEP) $< > build/$*.d
objs: $(OBJS)
build/flexnbd: $(COMMON_OBJ) $(SERVER_OBJ) build/main.o
$(LINK) $^ -o $@
build/flexnbd-proxy: $(COMMON_OBJ) $(PROXY_OBJ) build/proxy-main.o
$(LINK) $^ -o $@
server: build/flexnbd
proxy: build/flexnbd-proxy
CHECK_SRC := $(wildcard tests/unit/*.c)
CHECK_OBJ := $(CHECK_SRC:tests/unit/%.c=build/tests/%.o)
# Why can't we reuse the build/%.o rule above? Not sure.
build/tests/%.o: tests/unit/%.c
mkdir -p $(dir $@)
$(COMPILE) $< -o $@
$(SAVEDEP) $< > build/tests/$*.d
CHECK_BINS := $(CHECK_OBJ:build/tests/%.o=build/tests/%)
build/tests/%: build/tests/%.o $(OBJS)
$(LINK) $^ -o $@ -lcheck
check_objs: $(CHECK_OBJ)
check_bins: $(CHECK_BINS)
check: $(CHECK_BINS)
for bin in $^; do $$bin; done
build/flexnbd.1: README.txt
a2x --destination-dir build --format manpage $<
build/flexnbd-proxy.1: README.proxy.txt
a2x --destination-dir build --format manpage $<
# If we don't pipe to file, gzip clobbers the original, causing make
# to rebuild each time
%.1.gz: %.1
gzip -c -f $< > $@
server-man: build/flexnbd.1.gz
proxy-man: build/flexnbd-proxy.1.gz
doc: server-man proxy-man
install:
mkdir -p $(INSTALLDIR)
cp build/flexnbd build/flexnbd-proxy $(INSTALLDIR)
clean: clean:
rake clean rm -rf build/*
.PHONY: clean objs check_objs all server proxy check_bins check server-man proxy-man doc

View File

@@ -28,7 +28,8 @@ USAGE
----- -----
$ flexnbd-proxy --addr <ADDR> [ --port <PORT> ] $ flexnbd-proxy --addr <ADDR> [ --port <PORT> ]
--conn-addr <ADDR> --conn-port <PORT> [--bind <ADDR>] [option]* --conn-addr <ADDR> --conn-port <PORT>
[--bind <ADDR>] [--cache[=<CACHE_BYTES>]] [option]*
Proxy requests from an NBD client to an NBD server, resiliently. Only one Proxy requests from an NBD client to an NBD server, resiliently. Only one
client can be connected at a time, and ACLs cannot be applied to the client, as they client can be connected at a time, and ACLs cannot be applied to the client, as they
@@ -73,6 +74,10 @@ Options
*--conn-port, -P PORT*: *--conn-port, -P PORT*:
The port of the NBD server to connect to. Required. The port of the NBD server to connect to. Required.
*--cache, -c=CACHE_BYTES*:
If given, the size in bytes of read cache to use. CACHE_BYTES
defaults to 4096.
*--help, -h* : *--help, -h* :
Show command or global help. Show command or global help.
@@ -154,6 +159,29 @@ The proxy notices and reconnects, fulfiling any request it has in its buffer.
The data in myfile has been moved between physical servers without the nbd The data in myfile has been moved between physical servers without the nbd
client process having to be disturbed at all. client process having to be disturbed at all.
READ CACHE
----------
If the --cache option is given at the command line, either without an
argument or with an argument greater than 0, flexnbd-proxy will use a
read-ahead cache. The cache as currently implemented doubles each read
request size, up to a maximum of 2xCACHE_BYTES, and retains the latter
half in a buffer. If the next read request from the client exactly
matches the region held in the buffer, flexnbd-proxy responds from the
cache without making a request to the server.
This pattern is designed to match sequential reads, such as those
performed by a booting virtual machine.
Note: If specifying a cache size, you *must* use this form:
nbd-client$ flexnbd-proxy --cache=XXXX
That is, the '=' is required. This is a limitation of getopt-long.
If no cache size is given, a size of 4096 bytes is assumed. Caching can
be explicitly disabled by setting a size of 0.
BUGS BUGS
---- ----

View File

@@ -25,7 +25,7 @@ COMMANDS
serve serve
~~~~~ ~~~~~
$ flexnbd serve --addr <ADDR> --port <PORT> --file <FILE> $ flexnbd serve --addr <ADDR> --port <PORT> --file <FILE>
[--sock <SOCK>] [--default-deny] [global option]* [acl entry]* [--sock <SOCK>] [--default-deny] [-k] [global option]* [acl entry]*
Serve a file. If any ACL entries are given (which should be IP Serve a file. If any ACL entries are given (which should be IP
addresses), only those clients listed will be permitted to connect. addresses), only those clients listed will be permitted to connect.
@@ -55,6 +55,12 @@ Options
empty ACL will let no clients connect. If it is not given, an empty ACL will let no clients connect. If it is not given, an
empty ACL will let any client connect. empty ACL will let any client connect.
*--killswitch, -k*:
If set, we implement a 2-minute timeout on NBD requests and
responses. If a request takes longer than that to complete,
the client is disconnected. This is useful to keep broken
clients from breaking migrations, among other things.
listen listen
~~~~~~ ~~~~~~

316
Rakefile
View File

@@ -1,85 +1,34 @@
$: << '../rake_utils/lib' # encoding: utf-8
require 'rake_utils/debian'
include RakeUtils::DSL
CC=ENV['CC'] || "gcc" def make(*targets)
sh "make #{targets.map{|t| t.to_s}.join(" ")}"
DEBUG = ENV.has_key?('DEBUG') &&
%w|yes y ok 1 true t|.include?(ENV['DEBUG'])
ALL_SOURCES = FileList['src/*']
PROXY_ONLY_SOURCES = FileList['src/{proxy-main,proxy}.c']
PROXY_ONLY_OBJECTS = PROXY_ONLY_SOURCES.pathmap( "%{^src,build}X.o" )
SOURCES = ALL_SOURCES.select { |c| c =~ /\.c$/ } - PROXY_ONLY_SOURCES
OBJECTS = SOURCES.pathmap( "%{^src,build}X.o" ) - PROXY_ONLY_OBJECTS
PROXY_SOURCES = FileList['src/{ioutil,nbdtypes,readwrite,sockutil,util,parse}.c'] + PROXY_ONLY_SOURCES
PROXY_OBJECTS = PROXY_SOURCES.pathmap( "%{^src,build}X.o" )
TEST_SOURCES = FileList['tests/unit/*.c']
TEST_OBJECTS = TEST_SOURCES.pathmap( "%{^tests/unit,build/tests}X.o" )
LIBS = %w( pthread )
LDFLAGS = ["-lrt -lev"]
CCFLAGS = %w(
-D_GNU_SOURCE=1
-Wall
-Wextra
-Werror-implicit-function-declaration
-Wstrict-prototypes
-Wno-missing-field-initializers
) + # Added -Wno-missing-field-initializers to shut GCC up over {0} struct initialisers
[ENV['CFLAGS']]
LIBCHECK = File.exists?("/usr/lib/libcheck.a") ?
"/usr/lib/libcheck.a" :
"/usr/local/lib/libcheck.a"
TEST_MODULES = Dir["tests/unit/check_*.c"].map { |n|
File.basename( n )[%r{check_(.+)\.c},1] }
if DEBUG
LDFLAGS << ["-g"]
CCFLAGS << ["-g -DDEBUG"]
else
CCFLAGS << "-O2"
end end
def maketask( opts )
case opts
when Symbol
maketask opts => opts
else
opts.each do |name, targets|
task( name ){make *[*targets]}
end
end
end
desc "Build the binary and man page" desc "Build the binary and man page"
task :build => [:flexnbd, :flexnbd_proxy, :man] maketask :build => [:all, :doc]
task :default => :build
desc "Build just the flexnbd binary" desc "Build just the flexnbd binary"
task :flexnbd => "build/flexnbd" maketask :flexnbd => [:server]
file "build/flexnbd" => :flexnbd
desc "Build just the flexnbd-proxy binary" desc "Build just the flexnbd-proxy binary"
task :flexnbd_proxy => "build/flexnbd-proxy" maketask :flexnbd_proxy => [:proxy]
file "build/flexnbd-proxy" => :flexnbd_proxy
def check(m)
"build/tests/check_#{m}"
end
file "README.txt"
file "README.proxy.txt"
def manpage(name, src)
FileUtils.mkdir_p( "build" )
sh "a2x --destination-dir build --format manpage #{src}"
sh "gzip -f build/#{name}"
end
file "build/flexnbd.1.gz" => "README.txt" do
manpage("flexnbd.1", "README.txt")
end
file "build/flexnbd-proxy.1.gz" => "README.proxy.txt" do
manpage("flexnbd-proxy.1", "README.proxy.txt")
end
desc "Build just the man page" desc "Build just the man page"
task :man => ["build/flexnbd.1.gz", "build/flexnbd-proxy.1.gz"] maketask :man => :doc
namespace "test" do namespace "test" do
@@ -87,226 +36,25 @@ namespace "test" do
task 'run' => ["unit", "scenarios"] task 'run' => ["unit", "scenarios"]
desc "Build C tests" desc "Build C tests"
task 'build' => TEST_MODULES.map { |n| check n} maketask :build => :check_bins
TEST_MODULES.each do |m|
desc "Run tests for #{m}"
task "check_#{m}" => check(m) do
sh check m
end
end
desc "Run C tests" desc "Run C tests"
task 'unit' => 'build' do maketask :unit => :check
TEST_MODULES.each do |n|
ENV['EF_DISABLE_BANNER'] = '1'
sh check n
end
end
desc "Run NBD test scenarios" desc "Run NBD test scenarios"
task 'scenarios' => ['build/flexnbd', 'build/flexnbd-proxy'] do task 'scenarios' => ["build/flexnbd", "build/flexnbd-proxy"] do
sh "cd tests/acceptance; ruby nbd_scenarios -v" sh "cd tests/acceptance && RUBYOPT='-I.' ruby nbd_scenarios -v"
end end
end end
def gcc_compile( target, source )
FileUtils.mkdir_p File.dirname( target )
sh "#{CC} -Isrc -c #{CCFLAGS.join(' ')} -o #{target} #{source} "
end
def gcc_link(target, objects)
FileUtils.mkdir_p File.dirname( target )
sh "#{CC} #{LDFLAGS.join(' ')} "+
" -Isrc " +
" -o #{target} "+
objects.join(" ") +
" "+LIBS.map { |l| "-l#{l}" }.join(" ")
end
def headers(c)
`#{CC} -Isrc -MM #{c}`.gsub("\\\n", " ").split(" ")[2..-1]
end
rule 'build/flexnbd-proxy' => PROXY_OBJECTS do |t|
gcc_link(t.name, t.sources)
end
rule 'build/flexnbd' => OBJECTS do |t|
gcc_link(t.name, t.sources)
end
file check("client") =>
%w{build/tests/check_client.o
build/self_pipe.o
build/nbdtypes.o
build/flexnbd.o
build/flexthread.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/serve.o
build/acl.o
build/ioutil.o
build/mbox.o
build/mirror.o
build/status.o
build/sockutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("acl") =>
%w{build/tests/check_acl.o
build/parse.o
build/acl.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check( "util" ) =>
%w{build/tests/check_util.o
build/util.o
build/self_pipe.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("serve") =>
%w{build/tests/check_serve.o
build/self_pipe.o
build/nbdtypes.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/flexthread.o
build/serve.o
build/flexnbd.o
build/mirror.o
build/status.o
build/acl.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("status") =>
%w{
build/tests/check_status.o
build/self_pipe.o
build/nbdtypes.o
build/control.o
build/readwrite.o
build/parse.o
build/client.o
build/flexthread.o
build/serve.o
build/flexnbd.o
build/mirror.o
build/status.o
build/acl.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o
} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("readwrite") =>
%w{build/tests/check_readwrite.o
build/readwrite.o
build/client.o
build/self_pipe.o
build/serve.o
build/parse.o
build/acl.o
build/flexthread.o
build/control.o
build/flexnbd.o
build/mirror.o
build/status.o
build/nbdtypes.o
build/mbox.o
build/ioutil.o
build/sockutil.o
build/util.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("flexnbd") =>
%w{build/tests/check_flexnbd.o
build/flexnbd.o
build/ioutil.o
build/sockutil.o
build/util.o
build/control.o
build/mbox.o
build/flexthread.o
build/status.o
build/self_pipe.o
build/client.o
build/acl.o
build/parse.o
build/nbdtypes.o
build/readwrite.o
build/mirror.o
build/serve.o} do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
file check("control") =>
%w{build/tests/check_control.o} + OBJECTS - ["build/main.o", 'build/proxy-main.o', 'build/proxy.o'] do |t|
gcc_link t.name, t.prerequisites + [LIBCHECK]
end
(TEST_MODULES- %w{status control flexnbd acl client serve readwrite util}).each do |m|
tgt = "build/tests/check_#{m}.o"
maybe_obj_name = "build/#{m}.o"
# Take it out in case we're testing one of the utils
deps = ["build/ioutil.o", "build/util.o", "build/sockutil.o"] - [maybe_obj_name]
# Add it back in if it's something we need to compile
deps << maybe_obj_name if OBJECTS.include?( maybe_obj_name )
file check( m ) => deps + [tgt] do |t|
gcc_link(t.name, deps + [tgt, LIBCHECK])
end
end
OBJECTS.zip( SOURCES ).each do |o,c|
file o => [c]+headers(c) do |t| gcc_compile( o, c ) end
end
PROXY_ONLY_OBJECTS.zip( PROXY_ONLY_SOURCES).each do |o, c|
file o => [c]+headers(c) do |t| gcc_compile( o, c ) end
end
TEST_OBJECTS.zip( TEST_SOURCES ).each do |o,c|
file o => [c] + headers(c) do |t| gcc_compile( o, c ) end
end
desc "Remove all build targets, binaries and temporary files" desc "Remove all build targets, binaries and temporary files"
task :clean do maketask :clean
sh "rm -rf *~ build"
end file "debian/changelog" do
FileUtils.mkdir_p "debian"
namespace :pkg do sh "hg log --style=changelog.template > debian/changelog"
deb do |t|
t.code_files = ALL_SOURCES + ["Rakefile", "README.txt", "README.proxy.txt"]
t.pkg_name = "flexnbd"
t.generate_changelog!
end
end end
desc "Generate the changelog"
task :changelog => "debian/changelog"

2653
debian/changelog vendored

File diff suppressed because it is too large Load Diff

11
debian/control vendored
View File

@@ -1,14 +1,15 @@
Source: flexnbd Source: flexnbd
Section: unknown Section: web
Priority: extra Priority: extra
Maintainer: Alex Young <alex@bytemark.co.uk> Maintainer: Patrick J Cherry <patrick@bytemark.co.uk>
Build-Depends: cdbs, debhelper (>= 7.0.50), ruby, rake, gcc, libev-dev Build-Depends: debhelper (>= 7.0.50), ruby, rake, gcc, libev-dev,
asciidoc, libxml2-utils, xsltproc, xmlto, check
Standards-Version: 3.8.1 Standards-Version: 3.8.1
Homepage: http://bigv.io/ Homepage: https://github.com/BytemarkHosting/flexnbd-c
Package: flexnbd Package: flexnbd
Architecture: any Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}, libev3 Depends: ${shlibs:Depends}, ${misc:Depends}, libev4 | libev3
Description: FlexNBD server Description: FlexNBD server
An NBD server offering push-mirroring and intelligent sparse file handling An NBD server offering push-mirroring and intelligent sparse file handling

15
debian/rules vendored
View File

@@ -7,12 +7,13 @@
%: %:
dh $@ dh $@
override_dh_auto_build:
rake build
override_dh_auto_clean:
rake clean
.PHONY: override_dh_strip
override_dh_strip: override_dh_strip:
dh_strip --dbg-package=flexnbd-dbg dh_strip --dbg-package=flexnbd-dbg
#
# TODO: The ruby test suites don't work during buiding in a chroot, so leave
# them out for now.
#
#override_dh_auto_test:
# rake test:run

View File

@@ -1 +1 @@
3.0 (native) 3.0 (quilt)

View File

@@ -31,8 +31,6 @@ int build_allocation_map(struct bitset * allocation_map, int fd)
for (offset = 0; offset < allocation_map->size; ) { for (offset = 0; offset < allocation_map->size; ) {
unsigned int i;
fiemap->fm_start = offset; fiemap->fm_start = offset;
fiemap->fm_length = max_length; fiemap->fm_length = max_length;
@@ -49,7 +47,7 @@ int build_allocation_map(struct bitset * allocation_map, int fd)
return 0; /* it's up to the caller to free the map */ return 0; /* it's up to the caller to free the map */
} }
else { else {
for ( i = 0; i < fiemap->fm_mapped_extents; i++ ) { for ( unsigned int i = 0; i < fiemap->fm_mapped_extents; i++ ) {
bitset_set_range( allocation_map, bitset_set_range( allocation_map,
fiemap->fm_extents[i].fe_logical, fiemap->fm_extents[i].fe_logical,
fiemap->fm_extents[i].fe_length ); fiemap->fm_extents[i].fe_length );
@@ -71,22 +69,23 @@ int build_allocation_map(struct bitset * allocation_map, int fd)
} }
} }
debug("Successfully built allocation map"); info("Successfully built allocation map");
return 1; return 1;
} }
int open_and_mmap(const char* filename, int* out_fd, off64_t *out_size, void **out_map) int open_and_mmap(const char* filename, int* out_fd, uint64_t *out_size, void **out_map)
{ {
/*
* size and out_size are intentionally of different types.
* lseek64() uses off64_t to signal errors in the sign bit.
* Since we check for these errors before trying to assign to
* *out_size, we know *out_size can never go negative.
*/
off64_t size; off64_t size;
/* O_DIRECT seems to be intermittently supported. Leaving it as /* O_DIRECT should not be used with mmap() */
* a compile-time option for now. */
#ifdef DIRECT_IO
*out_fd = open(filename, O_RDWR | O_DIRECT | O_SYNC );
#else
*out_fd = open(filename, O_RDWR | O_SYNC ); *out_fd = open(filename, O_RDWR | O_SYNC );
#endif
if (*out_fd < 1) { if (*out_fd < 1) {
warn("open(%s) failed: does it exist?", filename); warn("open(%s) failed: does it exist?", filename);
@@ -109,8 +108,11 @@ int open_and_mmap(const char* filename, int* out_fd, off64_t *out_size, void **o
warn("mmap64() failed"); warn("mmap64() failed");
return -1; return -1;
} }
debug("opened %s size %ld on fd %d @ %p", filename, size, *out_fd, *out_map);
}
else {
debug("opened %s size %ld on fd %d", filename, size, *out_fd);
} }
debug("opened %s size %ld on fd %d @ %p", filename, size, *out_fd, *out_map);
return 0; return 0;
} }
@@ -139,7 +141,7 @@ int readloop(int filedes, void *buffer, size_t size)
ssize_t result = read(filedes, buffer+readden, size-readden); ssize_t result = read(filedes, buffer+readden, size-readden);
if ( result == 0 /* EOF */ ) { if ( result == 0 /* EOF */ ) {
warn( "end-of-file detected while reading" ); warn( "end-of-file detected while reading after %i bytes", readden );
return -1; return -1;
} }
@@ -347,4 +349,3 @@ ssize_t iobuf_write( int fd, struct iobuf *iobuf )
return count; return count;
} }

View File

@@ -65,7 +65,7 @@ int read_lines_until_blankline(int fd, int max_line_length, char ***lines);
* ''out_size'' and the address of the mmap in ''out_map''. If anything goes * ''out_size'' and the address of the mmap in ''out_map''. If anything goes
* wrong, returns -1 setting errno, otherwise 0. * wrong, returns -1 setting errno, otherwise 0.
*/ */
int open_and_mmap( const char* filename, int* out_fd, off64_t *out_size, void **out_map); int open_and_mmap( const char* filename, int* out_fd, uint64_t* out_size, void **out_map);
/** Check to see whether the given file descriptor is closed. /** Check to see whether the given file descriptor is closed.

View File

@@ -7,8 +7,9 @@ void mode(char* mode, int argc, char **argv);
#include <getopt.h> #include <getopt.h>
#define GETOPT_ARG(x,s) {(x), 1, 0, (s)} #define GETOPT_ARG(x,s) {(x), required_argument, 0, (s)}
#define GETOPT_FLAG(x,v) {(x), 0, 0, (v)} #define GETOPT_FLAG(x,v) {(x), no_argument, 0, (v)}
#define GETOPT_OPTARG(x,s) {(x), optional_argument, 0, (s)}
#define OPT_HELP "help" #define OPT_HELP "help"
#define OPT_ADDR "addr" #define OPT_ADDR "addr"
@@ -19,6 +20,7 @@ void mode(char* mode, int argc, char **argv);
#define OPT_FROM "from" #define OPT_FROM "from"
#define OPT_SIZE "size" #define OPT_SIZE "size"
#define OPT_DENY "default-deny" #define OPT_DENY "default-deny"
#define OPT_CACHE "cache"
#define OPT_UNLINK "unlink" #define OPT_UNLINK "unlink"
#define OPT_CONNECT_ADDR "conn-addr" #define OPT_CONNECT_ADDR "conn-addr"
#define OPT_CONNECT_PORT "conn-port" #define OPT_CONNECT_PORT "conn-port"
@@ -52,6 +54,7 @@ void mode(char* mode, int argc, char **argv);
#define GETOPT_FROM GETOPT_ARG( OPT_FROM, 'F' ) #define GETOPT_FROM GETOPT_ARG( OPT_FROM, 'F' )
#define GETOPT_SIZE GETOPT_ARG( OPT_SIZE, 'S' ) #define GETOPT_SIZE GETOPT_ARG( OPT_SIZE, 'S' )
#define GETOPT_BIND GETOPT_ARG( OPT_BIND, 'b' ) #define GETOPT_BIND GETOPT_ARG( OPT_BIND, 'b' )
#define GETOPT_CACHE GETOPT_OPTARG( OPT_CACHE, 'c' )
#define GETOPT_UNLINK GETOPT_ARG( OPT_UNLINK, 'u' ) #define GETOPT_UNLINK GETOPT_ARG( OPT_UNLINK, 'u' )
#define GETOPT_CONNECT_ADDR GETOPT_ARG( OPT_CONNECT_ADDR, 'C' ) #define GETOPT_CONNECT_ADDR GETOPT_ARG( OPT_CONNECT_ADDR, 'C' )
#define GETOPT_CONNECT_PORT GETOPT_ARG( OPT_CONNECT_PORT, 'P' ) #define GETOPT_CONNECT_PORT GETOPT_ARG( OPT_CONNECT_PORT, 'P' )

View File

@@ -27,7 +27,7 @@ void nbd_r2h_request( struct nbd_request_raw *from, struct nbd_request * to )
{ {
to->magic = htobe32( from->magic ); to->magic = htobe32( from->magic );
to->type = htobe32( from->type ); to->type = htobe32( from->type );
memcpy( to->handle, from->handle, 8 ); to->handle.w = from->handle.w;
to->from = htobe64( from->from ); to->from = htobe64( from->from );
to->len = htobe32( from->len ); to->len = htobe32( from->len );
} }
@@ -36,7 +36,7 @@ void nbd_h2r_request( struct nbd_request * from, struct nbd_request_raw * to )
{ {
to->magic = be32toh( from->magic ); to->magic = be32toh( from->magic );
to->type = be32toh( from->type ); to->type = be32toh( from->type );
memcpy( to->handle, from->handle, 8 ); to->handle.w = from->handle.w;
to->from = be64toh( from->from ); to->from = be64toh( from->from );
to->len = be32toh( from->len ); to->len = be32toh( from->len );
} }
@@ -46,13 +46,13 @@ void nbd_r2h_reply( struct nbd_reply_raw * from, struct nbd_reply * to )
{ {
to->magic = htobe32( from->magic ); to->magic = htobe32( from->magic );
to->error = htobe32( from->error ); to->error = htobe32( from->error );
memcpy( to->handle, from->handle, 8 ); to->handle.w = from->handle.w;
} }
void nbd_h2r_reply( struct nbd_reply * from, struct nbd_reply_raw * to ) void nbd_h2r_reply( struct nbd_reply * from, struct nbd_reply_raw * to )
{ {
to->magic = be32toh( from->magic ); to->magic = be32toh( from->magic );
to->error = be32toh( from->error ); to->error = be32toh( from->error );
memcpy( to->handle, from->handle, 8 ); to->handle.w = from->handle.w;
} }

View File

@@ -24,6 +24,11 @@
#include <linux/types.h> #include <linux/types.h>
#include <inttypes.h> #include <inttypes.h>
typedef union nbd_handle_t {
uint8_t b[8];
uint64_t w;
} nbd_handle_t;
/* The _raw types are the types as they appear on the wire. Non-_raw /* The _raw types are the types as they appear on the wire. Non-_raw
* types are in host-format. * types are in host-format.
* Conversion functions are _r2h_ for converting raw to host, and _h2r_ * Conversion functions are _r2h_ for converting raw to host, and _h2r_
@@ -39,7 +44,7 @@ struct nbd_init_raw {
struct nbd_request_raw { struct nbd_request_raw {
__be32 magic; __be32 magic;
__be32 type; /* == READ || == WRITE */ __be32 type; /* == READ || == WRITE */
char handle[8]; nbd_handle_t handle;
__be64 from; __be64 from;
__be32 len; __be32 len;
} __attribute__((packed)); } __attribute__((packed));
@@ -47,7 +52,7 @@ struct nbd_request_raw {
struct nbd_reply_raw { struct nbd_reply_raw {
__be32 magic; __be32 magic;
__be32 error; /* 0 = ok, else error */ __be32 error; /* 0 = ok, else error */
char handle[8]; /* handle you got from request */ nbd_handle_t handle; /* handle you got from request */
}; };
@@ -62,7 +67,7 @@ struct nbd_init {
struct nbd_request { struct nbd_request {
uint32_t magic; uint32_t magic;
uint32_t type; /* == READ || == WRITE || == DISCONNECT */ uint32_t type; /* == READ || == WRITE || == DISCONNECT */
char handle[8]; nbd_handle_t handle;
uint64_t from; uint64_t from;
uint32_t len; uint32_t len;
} __attribute__((packed)); } __attribute__((packed));
@@ -70,7 +75,7 @@ struct nbd_request {
struct nbd_reply { struct nbd_reply {
uint32_t magic; uint32_t magic;
uint32_t error; /* 0 = ok, else error */ uint32_t error; /* 0 = ok, else error */
char handle[8]; /* handle you got from request */ nbd_handle_t handle; /* handle you got from request */
}; };
void nbd_r2h_init( struct nbd_init_raw * from, struct nbd_init * to ); void nbd_r2h_init( struct nbd_init_raw * from, struct nbd_init * to );

View File

@@ -41,7 +41,7 @@ int socket_connect(struct sockaddr* to, struct sockaddr* from)
return fd; return fd;
} }
int nbd_check_hello( struct nbd_init_raw* init_raw, off64_t* out_size ) int nbd_check_hello( struct nbd_init_raw* init_raw, uint64_t* out_size )
{ {
if ( strncmp( init_raw->passwd, INIT_PASSWD, 8 ) != 0 ) { if ( strncmp( init_raw->passwd, INIT_PASSWD, 8 ) != 0 ) {
warn( "wrong passwd" ); warn( "wrong passwd" );
@@ -62,7 +62,7 @@ fail:
} }
int socket_nbd_read_hello( int fd, off64_t* out_size ) int socket_nbd_read_hello( int fd, uint64_t* out_size )
{ {
struct nbd_init_raw init_raw; struct nbd_init_raw init_raw;
@@ -101,12 +101,11 @@ int socket_nbd_write_hello(int fd, off64_t out_size)
return 1; return 1;
} }
void fill_request(struct nbd_request *request, int type, off64_t from, int len) void fill_request(struct nbd_request *request, int type, uint64_t from, uint32_t len)
{ {
request->magic = htobe32(REQUEST_MAGIC); request->magic = htobe32(REQUEST_MAGIC);
request->type = htobe32(type); request->type = htobe32(type);
((int*) request->handle)[0] = rand(); request->handle.w = (((uint64_t)rand()) << 32) | ((uint64_t)rand());
((int*) request->handle)[1] = rand();
request->from = htobe64(from); request->from = htobe64(from);
request->len = htobe32(len); request->len = htobe32(len);
} }
@@ -126,7 +125,7 @@ void read_reply(int fd, struct nbd_request *request, struct nbd_reply *reply)
if (reply->error != 0) { if (reply->error != 0) {
error("Server replied with error %d", reply->error); error("Server replied with error %d", reply->error);
} }
if (strncmp(request->handle, reply->handle, 8) != 0) { if (request->handle.w != reply->handle.w) {
error("Did not reply with correct handle"); error("Did not reply with correct handle");
} }
} }
@@ -149,7 +148,7 @@ void wait_for_data( int fd, int timeout_secs )
} }
void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs) void socket_nbd_read(int fd, uint64_t from, uint32_t len, int out_fd, void* out_buf, int timeout_secs)
{ {
struct nbd_request request; struct nbd_request request;
struct nbd_reply reply; struct nbd_reply reply;
@@ -173,7 +172,7 @@ void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, i
} }
} }
void socket_nbd_write(int fd, off64_t from, int len, int in_fd, void* in_buf, int timeout_secs) void socket_nbd_write(int fd, uint64_t from, uint32_t len, int in_fd, void* in_buf, int timeout_secs)
{ {
struct nbd_request request; struct nbd_request request;
struct nbd_reply reply; struct nbd_reply reply;
@@ -213,10 +212,12 @@ int socket_nbd_disconnect( int fd )
} }
#define CHECK_RANGE(error_type) { \ #define CHECK_RANGE(error_type) { \
off64_t size;\ uint64_t size;\
int success = socket_nbd_read_hello(params->client, &size); \ int success = socket_nbd_read_hello(params->client, &size); \
if ( success ) {\ if ( success ) {\
if (params->from < 0 || (params->from + params->len) > size) {\ uint64_t endpoint = params->from + params->len; \
if (endpoint > size || \
endpoint < params->from ) { /* this happens on overflow */ \
fatal(error_type \ fatal(error_type \
" request %d+%d is out of range given size %d", \ " request %d+%d is out of range given size %d", \
params->from, params->len, size\ params->from, params->len, size\

23
src/common/readwrite.h Normal file
View File

@@ -0,0 +1,23 @@
#ifndef READWRITE_H
#define READWRITE_H
#include <sys/types.h>
#include <sys/socket.h>
#include "nbdtypes.h"
int socket_connect(struct sockaddr* to, struct sockaddr* from);
int socket_nbd_read_hello(int fd, uint64_t* size);
int socket_nbd_write_hello(int fd, uint64_t size);
void socket_nbd_read(int fd, uint64_t from, uint32_t len, int out_fd, void* out_buf, int timeout_secs);
void socket_nbd_write(int fd, uint64_t from, uint32_t len, int out_fd, void* out_buf, int timeout_secs);
int socket_nbd_disconnect( int fd );
/* as you can see, we're slowly accumulating code that should really be in an
* NBD library */
void nbd_hello_to_buf( struct nbd_init_raw* buf, uint64_t out_size );
int nbd_check_hello( struct nbd_init_raw* init_raw, uint64_t* out_size );
#endif

View File

@@ -63,7 +63,5 @@ void do_remote_command(char* command, char* socket_name, int argc, char** argv)
print_response( response ); print_response( response );
exit(atoi(response)); exit(atoi(response));
close(remote);
} }

View File

@@ -51,7 +51,6 @@ struct self_pipe * self_pipe_create(void)
{ {
struct self_pipe *sig = xmalloc( sizeof( struct self_pipe ) ); struct self_pipe *sig = xmalloc( sizeof( struct self_pipe ) );
int fds[2]; int fds[2];
int fcntl_err;
if ( NULL == sig ) { return NULL; } if ( NULL == sig ) { return NULL; }
@@ -62,7 +61,7 @@ struct self_pipe * self_pipe_create(void)
} }
if ( fcntl( fds[0], F_SETFL, O_NONBLOCK ) || fcntl( fds[1], F_SETFL, O_NONBLOCK ) ) { if ( fcntl( fds[0], F_SETFL, O_NONBLOCK ) || fcntl( fds[1], F_SETFL, O_NONBLOCK ) ) {
fcntl_err = errno; int fcntl_err = errno;
while( close( fds[0] ) == -1 && errno == EINTR ); while( close( fds[0] ) == -1 && errno == EINTR );
while( close( fds[1] ) == -1 && errno == EINTR ); while( close( fds[1] ) == -1 && errno == EINTR );
free( sig ); free( sig );

View File

@@ -39,7 +39,6 @@ const char* sockaddr_address_string( const struct sockaddr* sa, char* dest, size
struct sockaddr_un* un = ( struct sockaddr_un* ) sa; struct sockaddr_un* un = ( struct sockaddr_un* ) sa;
unsigned short real_port = ntohs( in->sin_port ); // common to in and in6 unsigned short real_port = ntohs( in->sin_port ); // common to in and in6
size_t size;
const char* ret = NULL; const char* ret = NULL;
memset( dest, 0, len ); memset( dest, 0, len );
@@ -57,7 +56,7 @@ const char* sockaddr_address_string( const struct sockaddr* sa, char* dest, size
} }
if ( NULL != ret && real_port > 0 && sa->sa_family != AF_UNIX ) { if ( NULL != ret && real_port > 0 && sa->sa_family != AF_UNIX ) {
size = strlen( dest ); size_t size = strlen( dest );
snprintf( dest + size, len - size, " port %d", real_port ); snprintf( dest + size, len - size, " port %d", real_port );
} }
@@ -75,6 +74,11 @@ int sock_set_tcp_nodelay( int fd, int optval )
return setsockopt( fd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval) ); return setsockopt( fd, IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval) );
} }
int sock_set_tcp_cork( int fd, int optval )
{
return setsockopt( fd, IPPROTO_TCP, TCP_CORK, &optval, sizeof(optval) );
}
int sock_set_nonblock( int fd, int optval ) int sock_set_nonblock( int fd, int optval )
{ {
int flags = fcntl( fd, F_GETFL ); int flags = fcntl( fd, F_GETFL );
@@ -96,7 +100,7 @@ int sock_try_bind( int fd, const struct sockaddr* sa )
{ {
int bind_result; int bind_result;
char s_address[256]; char s_address[256];
int retry = 1; int retry = 10;
sockaddr_address_string( sa, &s_address[0], 256 ); sockaddr_address_string( sa, &s_address[0], 256 );
@@ -122,8 +126,11 @@ int sock_try_bind( int fd, const struct sockaddr* sa )
* will cope with it. * will cope with it.
*/ */
case EADDRNOTAVAIL: case EADDRNOTAVAIL:
debug( "retrying" ); retry--;
sleep( 1 ); if (retry) {
debug( "retrying" );
sleep( 1 );
}
continue; continue;
case EADDRINUSE: case EADDRINUSE:
warn( "%s in use, giving up.", s_address ); warn( "%s in use, giving up.", s_address );

View File

@@ -20,8 +20,8 @@ int sock_set_reuseaddr(int fd, int optval);
/* Set the tcp_nodelay option */ /* Set the tcp_nodelay option */
int sock_set_tcp_nodelay(int fd, int optval); int sock_set_tcp_nodelay(int fd, int optval);
/* TODO: Set the tcp_cork option */ /* Set the tcp_cork option */
// int sock_set_cork(int fd, int optval); int sock_set_tcp_cork(int fd, int optval);
int sock_set_nonblock(int fd, int optval); int sock_set_nonblock(int fd, int optval);

View File

@@ -116,6 +116,7 @@ uint64_t monotonic_time_ms(void);
#define fatal(msg, ...) do { \ #define fatal(msg, ...) do { \
myloglev(4, msg, ##__VA_ARGS__); \ myloglev(4, msg, ##__VA_ARGS__); \
error_handler(1); \ error_handler(1); \
exit(1); /* never-reached, this is to make static code analizer happy */ \
} while(0) } while(0)

View File

@@ -2,12 +2,16 @@
#include "mode.h" #include "mode.h"
#include <signal.h> #include <signal.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char** argv) int main(int argc, char** argv)
{ {
signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */ signal(SIGPIPE, SIG_IGN); /* calls to splice() unhelpfully throw this */
error_init(); error_init();
srand(time(NULL));
if (argc < 2) { if (argc < 2) {
exit_err( help_help_text ); exit_err( help_help_text );
} }

View File

@@ -1,14 +0,0 @@
#ifndef PREFETCH_H
#define PREFETCH_H
#define PREFETCH_BUFSIZE 4096
struct prefetch {
int is_full;
__be64 from;
__be32 len;
char buffer[PREFETCH_BUFSIZE];
};
#endif

View File

@@ -1,4 +1,6 @@
#include <signal.h> #include <signal.h>
#include <stdlib.h>
#include <time.h>
#include "mode.h" #include "mode.h"
#include "util.h" #include "util.h"
@@ -12,6 +14,7 @@ static struct option proxy_options[] = {
GETOPT_CONNECT_ADDR, GETOPT_CONNECT_ADDR,
GETOPT_CONNECT_PORT, GETOPT_CONNECT_PORT,
GETOPT_BIND, GETOPT_BIND,
GETOPT_CACHE,
GETOPT_QUIET, GETOPT_QUIET,
GETOPT_VERBOSE, GETOPT_VERBOSE,
{0} {0}
@@ -27,22 +30,25 @@ static char proxy_help_text[] =
"\t--" OPT_CONNECT_ADDR ",-C <ADDR>\tAddress of the proxied server.\n" "\t--" OPT_CONNECT_ADDR ",-C <ADDR>\tAddress of the proxied server.\n"
"\t--" OPT_CONNECT_PORT ",-P <PORT>\tPort of the proxied server.\n" "\t--" OPT_CONNECT_PORT ",-P <PORT>\tPort of the proxied server.\n"
"\t--" OPT_BIND ",-b <ADDR>\tThe address we connect from, as a proxy.\n" "\t--" OPT_BIND ",-b <ADDR>\tThe address we connect from, as a proxy.\n"
"\t--" OPT_CACHE ",-c[=<CACHE-BYTES>]\tUse a RAM read cache of the given size.\n"
QUIET_LINE QUIET_LINE
VERBOSE_LINE; VERBOSE_LINE;
static char proxy_default_cache_size[] = "4096";
void read_proxy_param( void read_proxy_param(
int c, int c,
char **downstream_addr, char **downstream_addr,
char **downstream_port, char **downstream_port,
char **upstream_addr, char **upstream_addr,
char **upstream_port, char **upstream_port,
char **bind_addr ) char **bind_addr,
char **cache_bytes)
{ {
switch( c ) { switch( c ) {
case 'h' : case 'h' :
fprintf( stdout, "%s\n", proxy_help_text ); fprintf( stdout, "%s\n", proxy_help_text );
exit( 0 ); exit( 0 );
break;
case 'l': case 'l':
*downstream_addr = optarg; *downstream_addr = optarg;
break; break;
@@ -58,6 +64,9 @@ void read_proxy_param(
case 'b': case 'b':
*bind_addr = optarg; *bind_addr = optarg;
break; break;
case 'c':
*cache_bytes = optarg ? optarg : proxy_default_cache_size;
break;
case 'q': case 'q':
log_level = QUIET_LOG_LEVEL; log_level = QUIET_LOG_LEVEL;
break; break;
@@ -89,6 +98,7 @@ int main( int argc, char *argv[] )
char *upstream_addr = NULL; char *upstream_addr = NULL;
char *upstream_port = NULL; char *upstream_port = NULL;
char *bind_addr = NULL; char *bind_addr = NULL;
char *cache_bytes = NULL;
int success; int success;
sigset_t mask; sigset_t mask;
@@ -103,6 +113,8 @@ int main( int argc, char *argv[] )
exit_action.sa_mask = mask; exit_action.sa_mask = mask;
exit_action.sa_flags = 0; exit_action.sa_flags = 0;
srand(time(NULL));
while (1) { while (1) {
c = getopt_long( argc, argv, proxy_short_options, proxy_options, NULL ); c = getopt_long( argc, argv, proxy_short_options, proxy_options, NULL );
if ( -1 == c ) { break; } if ( -1 == c ) { break; }
@@ -111,7 +123,8 @@ int main( int argc, char *argv[] )
&downstream_port, &downstream_port,
&upstream_addr, &upstream_addr,
&upstream_port, &upstream_port,
&bind_addr &bind_addr,
&cache_bytes
); );
} }
@@ -128,7 +141,8 @@ int main( int argc, char *argv[] )
downstream_port, downstream_port,
upstream_addr, upstream_addr,
upstream_port, upstream_port,
bind_addr bind_addr,
cache_bytes
); );
/* Set these *after* proxy has been assigned to */ /* Set these *after* proxy has been assigned to */

68
src/proxy/prefetch.c Normal file
View File

@@ -0,0 +1,68 @@
#include "prefetch.h"
#include "util.h"
struct prefetch* prefetch_create( size_t size_bytes ){
struct prefetch* out = xmalloc( sizeof( struct prefetch ) );
NULLCHECK( out );
out->buffer = xmalloc( size_bytes );
NULLCHECK( out->buffer );
out->size = size_bytes;
out->is_full = 0;
out->from = 0;
out->len = 0;
return out;
}
void prefetch_destroy( struct prefetch *prefetch ) {
if( prefetch ) {
free( prefetch->buffer );
free( prefetch );
}
}
size_t prefetch_size( struct prefetch *prefetch){
if ( prefetch ) {
return prefetch->size;
} else {
return 0;
}
}
void prefetch_set_is_empty( struct prefetch *prefetch ){
prefetch_set_full( prefetch, 0 );
}
void prefetch_set_is_full( struct prefetch *prefetch ){
prefetch_set_full( prefetch, 1 );
}
void prefetch_set_full( struct prefetch *prefetch, int val ){
if( prefetch ) {
prefetch->is_full = val;
}
}
int prefetch_is_full( struct prefetch *prefetch ){
if( prefetch ) {
return prefetch->is_full;
} else {
return 0;
}
}
int prefetch_contains( struct prefetch *prefetch, uint64_t from, uint32_t len ){
NULLCHECK( prefetch );
return from >= prefetch->from &&
from + len <= prefetch->from + prefetch->len;
}
char *prefetch_offset( struct prefetch *prefetch, uint64_t from ){
NULLCHECK( prefetch );
return prefetch->buffer + (from - prefetch->from);
}

33
src/proxy/prefetch.h Normal file
View File

@@ -0,0 +1,33 @@
#ifndef PREFETCH_H
#define PREFETCH_H
#include <stdint.h>
#include <stddef.h>
#define PREFETCH_BUFSIZE 4096
struct prefetch {
/* True if there is data in the buffer. */
int is_full;
/* The start point of the current content of buffer */
uint64_t from;
/* The length of the current content of buffer */
uint32_t len;
/* The total size of the buffer, in bytes. */
size_t size;
char *buffer;
};
struct prefetch* prefetch_create( size_t size_bytes );
void prefetch_destroy( struct prefetch *prefetch );
size_t prefetch_size( struct prefetch *);
void prefetch_set_is_empty( struct prefetch *prefetch );
void prefetch_set_is_full( struct prefetch *prefetch );
void prefetch_set_full( struct prefetch *prefetch, int val );
int prefetch_is_full( struct prefetch *prefetch );
int prefetch_contains( struct prefetch *prefetch, uint64_t from, uint32_t len );
char *prefetch_offset( struct prefetch *prefetch, uint64_t from );
#endif

View File

@@ -1,9 +1,7 @@
#include "proxy.h" #include "proxy.h"
#include "readwrite.h" #include "readwrite.h"
#ifdef PREFETCH
#include "prefetch.h" #include "prefetch.h"
#endif
#include "ioutil.h" #include "ioutil.h"
@@ -20,7 +18,8 @@ struct proxier* proxy_create(
char* s_downstream_port, char* s_downstream_port,
char* s_upstream_address, char* s_upstream_address,
char* s_upstream_port, char* s_upstream_port,
char* s_upstream_bind ) char* s_upstream_bind,
char* s_cache_bytes )
{ {
struct proxier* out; struct proxier* out;
out = xmalloc( sizeof( struct proxier ) ); out = xmalloc( sizeof( struct proxier ) );
@@ -65,9 +64,16 @@ struct proxier* proxy_create(
out->downstream_fd = -1; out->downstream_fd = -1;
out->upstream_fd = -1; out->upstream_fd = -1;
#ifdef PREFETCH out->prefetch = NULL;
out->prefetch = xmalloc( sizeof( struct prefetch ) ); if ( s_cache_bytes ){
#endif int cache_bytes = atoi( s_cache_bytes );
/* leaving this off or setting a cache size of zero or
* less results in no cache.
*/
if ( cache_bytes >= 0 ) {
out->prefetch = prefetch_create( cache_bytes );
}
}
out->init.buf = xmalloc( sizeof( struct nbd_init_raw ) ); out->init.buf = xmalloc( sizeof( struct nbd_init_raw ) );
out->req.buf = xmalloc( NBD_MAX_SIZE ); out->req.buf = xmalloc( NBD_MAX_SIZE );
@@ -76,20 +82,28 @@ struct proxier* proxy_create(
return out; return out;
} }
int proxy_prefetches( struct proxier* proxy ) {
NULLCHECK( proxy );
return proxy->prefetch != NULL;
}
int proxy_prefetch_bufsize( struct proxier* proxy ){
NULLCHECK( proxy );
return prefetch_size( proxy->prefetch );
}
void proxy_destroy( struct proxier* proxy ) void proxy_destroy( struct proxier* proxy )
{ {
free( proxy->init.buf ); free( proxy->init.buf );
free( proxy->req.buf ); free( proxy->req.buf );
free( proxy->rsp.buf ); free( proxy->rsp.buf );
#ifdef PREFETCH prefetch_destroy( proxy->prefetch );
free( proxy->prefetch );
#endif
free( proxy ); free( proxy );
} }
/* Shared between our two different connect_to_upstream paths */ /* Shared between our two different connect_to_upstream paths */
void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size ); void proxy_finish_connect_to_upstream( struct proxier *proxy, uint64_t size );
/* Try to establish a connection to our upstream server. Return 1 on success, /* Try to establish a connection to our upstream server. Return 1 on success,
* 0 on failure. this is a blocking call that returns a non-blocking socket. * 0 on failure. this is a blocking call that returns a non-blocking socket.
@@ -102,7 +116,7 @@ int proxy_connect_to_upstream( struct proxier* proxy )
} }
int fd = socket_connect( &proxy->connect_to.generic, connect_from ); int fd = socket_connect( &proxy->connect_to.generic, connect_from );
off64_t size = 0; uint64_t size = 0;
if ( -1 == fd ) { if ( -1 == fd ) {
return 0; return 0;
@@ -174,7 +188,7 @@ error:
return; return;
} }
void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size ) { void proxy_finish_connect_to_upstream( struct proxier *proxy, uint64_t size ) {
if ( proxy->upstream_size == 0 ) { if ( proxy->upstream_size == 0 ) {
info( "Size of upstream image is %"PRIu64" bytes", size ); info( "Size of upstream image is %"PRIu64" bytes", size );
@@ -186,6 +200,13 @@ void proxy_finish_connect_to_upstream( struct proxier *proxy, off64_t size ) {
} }
proxy->upstream_size = size; proxy->upstream_size = size;
if ( AF_UNIX != proxy->connect_to.family ) {
if ( sock_set_tcp_nodelay( proxy->upstream_fd, 1 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to set TCP_NODELAY" ) );
}
}
info( "Connected to upstream on fd %i", proxy->upstream_fd ); info( "Connected to upstream on fd %i", proxy->upstream_fd );
return; return;
@@ -272,10 +293,9 @@ static inline int proxy_state_upstream( int state )
state == WRITE_TO_UPSTREAM || state == READ_FROM_UPSTREAM; state == WRITE_TO_UPSTREAM || state == READ_FROM_UPSTREAM;
} }
#ifdef PREFETCH
int proxy_prefetch_for_request( struct proxier* proxy, int state ) int proxy_prefetch_for_request( struct proxier* proxy, int state )
{ {
NULLCHECK( proxy );
struct nbd_request* req = &proxy->req_hdr; struct nbd_request* req = &proxy->req_hdr;
struct nbd_reply* rsp = &proxy->rsp_hdr; struct nbd_reply* rsp = &proxy->rsp_hdr;
@@ -284,23 +304,11 @@ int proxy_prefetch_for_request( struct proxier* proxy, int state )
int is_read = ( req->type & REQUEST_MASK ) == REQUEST_READ; int is_read = ( req->type & REQUEST_MASK ) == REQUEST_READ;
int prefetch_start = req->from;
int prefetch_end = req->from + ( req->len * 2 );
/* We only want to consider prefetching if we know we're not
* getting too much data back, if it's a read request, and if
* the prefetch won't try to read past the end of the file.
*/
int prefetching = req->len <= PREFETCH_BUFSIZE && is_read &&
prefetch_start < prefetch_end && prefetch_end <= proxy->upstream_size;
if ( is_read ) { if ( is_read ) {
/* See if we can respond with what's in our prefetch /* See if we can respond with what's in our prefetch
* cache */ * cache */
if ( proxy->prefetch->is_full && if ( prefetch_is_full( proxy->prefetch ) &&
req->from == proxy->prefetch->from && prefetch_contains( proxy->prefetch, req->from, req->len ) ) {
req->len == proxy->prefetch->len ) {
/* HUZZAH! A match! */ /* HUZZAH! A match! */
debug( "Prefetch hit!" ); debug( "Prefetch hit!" );
@@ -315,10 +323,11 @@ int proxy_prefetch_for_request( struct proxier* proxy, int state )
/* and the data */ /* and the data */
memcpy( memcpy(
proxy->rsp.buf + NBD_REPLY_SIZE, proxy->rsp.buf + NBD_REPLY_SIZE,
proxy->prefetch->buffer, proxy->prefetch->len prefetch_offset( proxy->prefetch, req->from ),
req->len
); );
proxy->rsp.size = NBD_REPLY_SIZE + proxy->prefetch->len; proxy->rsp.size = NBD_REPLY_SIZE + req->len;
proxy->rsp.needle = 0; proxy->rsp.needle = 0;
/* return early, our work here is done */ /* return early, our work here is done */
@@ -332,11 +341,24 @@ int proxy_prefetch_for_request( struct proxier* proxy, int state )
* whether we can keep it or not. * whether we can keep it or not.
*/ */
debug( "Blowing away prefetch cache on type %d request.", req->type ); debug( "Blowing away prefetch cache on type %d request.", req->type );
proxy->prefetch->is_full = 0; prefetch_set_is_empty( proxy->prefetch );
} }
debug( "Prefetch cache MISS!"); debug( "Prefetch cache MISS!");
uint64_t prefetch_start = req->from;
/* We prefetch what we expect to be the next request. */
uint64_t prefetch_end = req->from + ( req->len * 2 );
/* We only want to consider prefetching if we know we're not
* getting too much data back, if it's a read request, and if
* the prefetch won't try to read past the end of the file.
*/
int prefetching =
req->len <= prefetch_size( proxy->prefetch ) &&
is_read &&
prefetch_start < prefetch_end &&
prefetch_end <= proxy->upstream_size;
/* We pull the request out of the proxy struct, rewrite the /* We pull the request out of the proxy struct, rewrite the
* request size, and write it back. * request size, and write it back.
@@ -347,7 +369,8 @@ int proxy_prefetch_for_request( struct proxier* proxy, int state )
req->len *= 2; req->len *= 2;
debug( "Prefetching %"PRIu32" bytes", req->len - proxy->prefetch_req_orig_len ); debug( "Prefetching additional %"PRIu32" bytes",
req->len - proxy->prefetch_req_orig_len );
nbd_h2r_request( req, req_raw ); nbd_h2r_request( req, req_raw );
} }
@@ -364,10 +387,10 @@ int proxy_prefetch_for_reply( struct proxier* proxy, int state )
prefetched_bytes = proxy->req_hdr.len - proxy->prefetch_req_orig_len; prefetched_bytes = proxy->req_hdr.len - proxy->prefetch_req_orig_len;
debug( "Prefetched %d bytes", prefetched_bytes ); debug( "Prefetched additional %d bytes", prefetched_bytes );
memcpy( memcpy(
proxy->rsp.buf + proxy->prefetch_req_orig_len, proxy->prefetch->buffer,
&(proxy->prefetch->buffer), proxy->rsp.buf + proxy->prefetch_req_orig_len + NBD_REPLY_SIZE,
prefetched_bytes prefetched_bytes
); );
@@ -382,13 +405,12 @@ int proxy_prefetch_for_reply( struct proxier* proxy, int state )
proxy->rsp.size -= prefetched_bytes; proxy->rsp.size -= prefetched_bytes;
/* And we need to reset these */ /* And we need to reset these */
proxy->prefetch->is_full = 1; prefetch_set_is_full( proxy->prefetch );
proxy->is_prefetch_req = 0; proxy->is_prefetch_req = 0;
return state; return state;
} }
#endif
int proxy_read_from_downstream( struct proxier *proxy, int state ) int proxy_read_from_downstream( struct proxier *proxy, int state )
@@ -469,10 +491,8 @@ int proxy_continue_connecting_to_upstream( struct proxier* proxy, int state )
return state; return state;
} }
#ifdef PREFETCH
/* Data may have changed while we were disconnected */ /* Data may have changed while we were disconnected */
proxy->prefetch->is_full = 0; prefetch_set_is_empty( proxy->prefetch );
#endif
info( "Connected to upstream on fd %i", proxy->upstream_fd ); info( "Connected to upstream on fd %i", proxy->upstream_fd );
return READ_INIT_FROM_UPSTREAM; return READ_INIT_FROM_UPSTREAM;
@@ -492,7 +512,7 @@ int proxy_read_init_from_upstream( struct proxier* proxy, int state )
} }
if ( proxy->init.needle == proxy->init.size ) { if ( proxy->init.needle == proxy->init.size ) {
off64_t upstream_size; uint64_t upstream_size;
if ( !nbd_check_hello( (struct nbd_init_raw*) proxy->init.buf, &upstream_size ) ) { if ( !nbd_check_hello( (struct nbd_init_raw*) proxy->init.buf, &upstream_size ) ) {
warn( "Upstream sent invalid init" ); warn( "Upstream sent invalid init" );
goto disconnect; goto disconnect;
@@ -518,11 +538,22 @@ int proxy_write_to_upstream( struct proxier* proxy, int state )
ssize_t count; ssize_t count;
// assert( state == WRITE_TO_UPSTREAM ); // assert( state == WRITE_TO_UPSTREAM );
/* FIXME: We may set cork=1 multiple times as a result of this idiom.
* Not a serious problem, but we could do better
*/
if ( proxy->req.needle == 0 && AF_UNIX != proxy->connect_to.family ) {
if ( sock_set_tcp_cork( proxy->upstream_fd, 1 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to set TCP_CORK" ) );
}
}
count = iobuf_write( proxy->upstream_fd, &proxy->req ); count = iobuf_write( proxy->upstream_fd, &proxy->req );
if ( count == -1 ) { if ( count == -1 ) {
warn( SHOW_ERRNO( "Failed to send request to upstream" ) ); warn( SHOW_ERRNO( "Failed to send request to upstream" ) );
proxy->req.needle = 0; proxy->req.needle = 0;
// We're throwing the socket away so no need to uncork
return CONNECT_TO_UPSTREAM; return CONNECT_TO_UPSTREAM;
} }
@@ -531,6 +562,14 @@ int proxy_write_to_upstream( struct proxier* proxy, int state )
* still need req.size if reading the reply fails - we disconnect * still need req.size if reading the reply fails - we disconnect
* and resend the reply in that case - so keep it around for now. */ * and resend the reply in that case - so keep it around for now. */
proxy->req.needle = 0; proxy->req.needle = 0;
if ( AF_UNIX != proxy->connect_to.family ) {
if ( sock_set_tcp_cork( proxy->upstream_fd, 0 ) == -1 ) {
warn( SHOW_ERRNO( "Failed to unset TCP_CORK" ) );
// TODO: should we return to CONNECT_TO_UPSTREAM in this instance?
}
}
return READ_FROM_UPSTREAM; return READ_FROM_UPSTREAM;
} }
@@ -670,7 +709,7 @@ void proxy_session( struct proxier* proxy )
state_started = monotonic_time_ms(); state_started = monotonic_time_ms();
debug( debug(
"State transitition from %s to %s", "State transition from %s to %s",
proxy_session_state_names[old_state], proxy_session_state_names[old_state],
proxy_session_state_names[state] proxy_session_state_names[state]
); );
@@ -736,14 +775,12 @@ void proxy_session( struct proxier* proxy )
case READ_FROM_DOWNSTREAM: case READ_FROM_DOWNSTREAM:
if ( FD_ISSET( proxy->downstream_fd, &rfds ) ) { if ( FD_ISSET( proxy->downstream_fd, &rfds ) ) {
state = proxy_read_from_downstream( proxy, state ); state = proxy_read_from_downstream( proxy, state );
#ifdef PREFETCH
/* Check if we can fulfil the request from prefetch, or /* Check if we can fulfil the request from prefetch, or
* rewrite the request to fill the prefetch buffer if needed * rewrite the request to fill the prefetch buffer if needed
*/ */
if ( state == WRITE_TO_UPSTREAM ) { if ( proxy_prefetches( proxy ) && state == WRITE_TO_UPSTREAM ) {
state = proxy_prefetch_for_request( proxy, state ); state = proxy_prefetch_for_request( proxy, state );
} }
#endif
} }
break; break;
case CONNECT_TO_UPSTREAM: case CONNECT_TO_UPSTREAM:
@@ -774,12 +811,10 @@ void proxy_session( struct proxier* proxy )
if ( FD_ISSET( proxy->upstream_fd, &rfds ) ) { if ( FD_ISSET( proxy->upstream_fd, &rfds ) ) {
state = proxy_read_from_upstream( proxy, state ); state = proxy_read_from_upstream( proxy, state );
} }
# ifdef PREFETCH
/* Fill the prefetch buffer and rewrite the reply, if needed */ /* Fill the prefetch buffer and rewrite the reply, if needed */
if ( state == WRITE_TO_DOWNSTREAM ) { if ( proxy_prefetches( proxy ) && state == WRITE_TO_DOWNSTREAM ) {
state = proxy_prefetch_for_reply( proxy, state ); state = proxy_prefetch_for_reply( proxy, state );
} }
#endif
break; break;
case WRITE_TO_DOWNSTREAM: case WRITE_TO_DOWNSTREAM:
if ( FD_ISSET( proxy->downstream_fd, &wfds ) ) { if ( FD_ISSET( proxy->downstream_fd, &wfds ) ) {
@@ -797,6 +832,13 @@ void proxy_session( struct proxier* proxy )
proxy_session_state_names[state] proxy_session_state_names[state]
); );
state = CONNECT_TO_UPSTREAM; state = CONNECT_TO_UPSTREAM;
/* Since we've timed out, we won't have gone through the timeout logic
* in the various state handlers that resets these appropriately... */
proxy->init.size = 0;
proxy->init.needle = 0;
proxy->rsp.size = 0;
proxy->rsp.needle = 0;
} }
} }
} }

View File

@@ -5,7 +5,6 @@
#include <unistd.h> #include <unistd.h>
#include "ioutil.h" #include "ioutil.h"
#include "flexnbd.h"
#include "parse.h" #include "parse.h"
#include "nbdtypes.h" #include "nbdtypes.h"
#include "self_pipe.h" #include "self_pipe.h"
@@ -21,9 +20,6 @@
#define UPSTREAM_TIMEOUT 30 * 1000 #define UPSTREAM_TIMEOUT 30 * 1000
struct proxier { struct proxier {
/* The flexnbd wrapper this proxier is attached to */
struct flexnbd* flexnbd;
/** address/port to bind to */ /** address/port to bind to */
union mysockaddr listen_on; union mysockaddr listen_on;
@@ -48,7 +44,7 @@ struct proxier {
int upstream_fd; int upstream_fd;
/* This is the size we advertise to the downstream server */ /* This is the size we advertise to the downstream server */
off64_t upstream_size; uint64_t upstream_size;
/* We transform the raw request header into here */ /* We transform the raw request header into here */
struct nbd_request req_hdr; struct nbd_request req_hdr;
@@ -73,7 +69,8 @@ struct proxier {
uint64_t req_count; uint64_t req_count;
int hello_sent; int hello_sent;
#ifdef PREFETCH /** These are only used if we pass --cache on the command line */
/* While the in-flight request has been munged by prefetch, these two are /* While the in-flight request has been munged by prefetch, these two are
* set to true, and the original length of the request, respectively */ * set to true, and the original length of the request, respectively */
int is_prefetch_req; int is_prefetch_req;
@@ -81,7 +78,8 @@ struct proxier {
/* And here, we actually store the prefetched data once it's returned */ /* And here, we actually store the prefetched data once it's returned */
struct prefetch *prefetch; struct prefetch *prefetch;
#endif
/** */
}; };
struct proxier* proxy_create( struct proxier* proxy_create(
@@ -89,7 +87,8 @@ struct proxier* proxy_create(
char* s_downstream_port, char* s_downstream_port,
char* s_upstream_address, char* s_upstream_address,
char* s_upstream_port, char* s_upstream_port,
char* s_upstream_bind ); char* s_upstream_bind,
char* s_cache_bytes);
int do_proxy( struct proxier* proxy ); int do_proxy( struct proxier* proxy );
void proxy_cleanup( struct proxier* proxy ); void proxy_cleanup( struct proxier* proxy );
void proxy_destroy( struct proxier* proxy ); void proxy_destroy( struct proxier* proxy );

View File

@@ -1,23 +0,0 @@
#ifndef READWRITE_H
#define READWRITE_H
#include <sys/types.h>
#include <sys/socket.h>
#include "nbdtypes.h"
int socket_connect(struct sockaddr* to, struct sockaddr* from);
int socket_nbd_read_hello(int fd, off64_t * size);
int socket_nbd_write_hello(int fd, off64_t size);
void socket_nbd_read(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs);
void socket_nbd_write(int fd, off64_t from, int len, int out_fd, void* out_buf, int timeout_secs);
int socket_nbd_disconnect( int fd );
/* as you can see, we're slowly accumulating code that should really be in an
* NBD library */
void nbd_hello_to_buf( struct nbd_init_raw* buf, off64_t out_size );
int nbd_check_hello( struct nbd_init_raw* init_raw, off64_t* out_size );
#endif

View File

@@ -31,7 +31,7 @@ static int is_included_in_acl(int list_length, struct ip_and_mask (*list)[], uni
for (i=0; i < list_length; i++) { for (i=0; i < list_length; i++) {
struct ip_and_mask *entry = &(*list)[i]; struct ip_and_mask *entry = &(*list)[i];
int testbits; int testbits;
unsigned char *raw_address1, *raw_address2; unsigned char *raw_address1 = NULL, *raw_address2 = NULL;
debug("checking acl entry %d (%d/%d)", i, test->generic.sa_family, entry->ip.family); debug("checking acl entry %d (%d/%d)", i, test->generic.sa_family, entry->ip.family);

View File

@@ -7,43 +7,64 @@
#include <string.h> #include <string.h>
#include <pthread.h> #include <pthread.h>
/*
* Make the bitfield words 'opaque' to prevent code
* poking at the bits directly without using these
* accessors/macros
*/
typedef uint64_t bitfield_word_t;
typedef bitfield_word_t * bitfield_p;
static inline char char_with_bit_set(uint64_t num) { return 1<<(num%8); } #define BITFIELD_WORD_SIZE sizeof(bitfield_word_t)
#define BITS_PER_WORD (BITFIELD_WORD_SIZE * 8)
#define BIT_MASK(_idx) \
(1LL << ((_idx) & (BITS_PER_WORD - 1)))
#define BIT_WORD(_b, _idx) \
((bitfield_word_t*)(_b))[(_idx) / BITS_PER_WORD]
/* Calculates the number of words needed to store _bytes number of bytes
* this is added to accommodate code that wants to use bytes sizes
*/
#define BIT_WORDS_FOR_SIZE(_bytes) \
((_bytes + (BITFIELD_WORD_SIZE-1)) / BITFIELD_WORD_SIZE)
/** Return the bit value ''idx'' in array ''b'' */
static inline int bit_get(bitfield_p b, uint64_t idx) {
return (BIT_WORD(b, idx) >> (idx & (BITS_PER_WORD-1))) & 1;
}
/** Return 1 if the bit at ''idx'' in array ''b'' is set */ /** Return 1 if the bit at ''idx'' in array ''b'' is set */
static inline int bit_is_set(char* b, uint64_t idx) { static inline int bit_is_set(bitfield_p b, uint64_t idx) {
return (b[idx/8] & char_with_bit_set(idx)) != 0; return bit_get(b, idx);
} }
/** Return 1 if the bit at ''idx'' in array ''b'' is clear */ /** Return 1 if the bit at ''idx'' in array ''b'' is clear */
static inline int bit_is_clear(char* b, uint64_t idx) { static inline int bit_is_clear(bitfield_p b, uint64_t idx) {
return !bit_is_set(b, idx); return !bit_get(b, idx);
} }
/** Tests whether the bit at ''idx'' in array ''b'' has value ''value'' */ /** Tests whether the bit at ''idx'' in array ''b'' has value ''value'' */
static inline int bit_has_value(char* b, uint64_t idx, int value) { static inline int bit_has_value(bitfield_p b, uint64_t idx, int value) {
if (value) { return bit_is_set(b, idx); } return bit_get(b, idx) == !!value;
else { return bit_is_clear(b, idx); }
} }
/** Sets the bit ''idx'' in array ''b'' */ /** Sets the bit ''idx'' in array ''b'' */
static inline void bit_set(char* b, uint64_t idx) { static inline void bit_set(bitfield_p b, uint64_t idx) {
b[idx/8] |= char_with_bit_set(idx); BIT_WORD(b, idx) |= BIT_MASK(idx);
//__sync_fetch_and_or(b+(idx/8), char_with_bit_set(idx));
} }
/** Clears the bit ''idx'' in array ''b'' */ /** Clears the bit ''idx'' in array ''b'' */
static inline void bit_clear(char* b, uint64_t idx) { static inline void bit_clear(bitfield_p b, uint64_t idx) {
b[idx/8] &= ~char_with_bit_set(idx); BIT_WORD(b, idx) &= ~BIT_MASK(idx);
//__sync_fetch_and_nand(b+(idx/8), char_with_bit_set(idx));
} }
/** Sets ''len'' bits in array ''b'' starting at offset ''from'' */ /** Sets ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_set_range(char* b, uint64_t from, uint64_t len) static inline void bit_set_range(bitfield_p b, uint64_t from, uint64_t len)
{ {
for ( ; from%8 != 0 && len > 0 ; len-- ) { for ( ; (from % BITS_PER_WORD) != 0 && len > 0 ; len-- ) {
bit_set( b, from++ ); bit_set( b, from++ );
} }
if (len >= 8) { if (len >= BITS_PER_WORD) {
memset(b+(from/8), 255, len/8 ); memset(&BIT_WORD(b, from), 0xff, len / 8 );
from += len; from += len;
len = (len%8); len = len % BITS_PER_WORD;
from -= len; from -= len;
} }
@@ -52,16 +73,16 @@ static inline void bit_set_range(char* b, uint64_t from, uint64_t len)
} }
} }
/** Clears ''len'' bits in array ''b'' starting at offset ''from'' */ /** Clears ''len'' bits in array ''b'' starting at offset ''from'' */
static inline void bit_clear_range(char* b, uint64_t from, uint64_t len) static inline void bit_clear_range(bitfield_p b, uint64_t from, uint64_t len)
{ {
for ( ; from%8 != 0 && len > 0 ; len-- ) { for ( ; (from % BITS_PER_WORD) != 0 && len > 0 ; len-- ) {
bit_clear( b, from++ ); bit_clear( b, from++ );
} }
if (len >= 8) { if (len >= BITS_PER_WORD) {
memset(b+(from/8), 0, len/8 ); memset(&BIT_WORD(b, from), 0, len / 8 );
from += len; from += len;
len = (len%8); len = len % BITS_PER_WORD;
from -= len; from -= len;
} }
@@ -75,34 +96,33 @@ static inline void bit_clear_range(char* b, uint64_t from, uint64_t len)
* bits that are the same as the first one specified. If ''run_is_set'' is * bits that are the same as the first one specified. If ''run_is_set'' is
* non-NULL, the value of that bit is placed into it. * non-NULL, the value of that bit is placed into it.
*/ */
static inline uint64_t bit_run_count(char* b, uint64_t from, uint64_t len, int *run_is_set) { static inline uint64_t bit_run_count(bitfield_p b, uint64_t from, uint64_t len, int *run_is_set) {
uint64_t* current_block;
uint64_t count = 0; uint64_t count = 0;
int first_value = bit_is_set(b, from); int first_value = bit_get(b, from);
bitfield_word_t word_match = first_value ? -1 : 0;
if ( run_is_set != NULL ) { if ( run_is_set != NULL ) {
*run_is_set = first_value; *run_is_set = first_value;
} }
for ( ; (from+count) % 64 != 0 && len > 0; len--) { for ( ; ((from + count) % BITS_PER_WORD) != 0 && len > 0; len--) {
if (bit_has_value(b, from+count, first_value)) { if (bit_has_value(b, from + count, first_value)) {
count++; count++;
} else { } else {
return count; return count;
} }
} }
for ( ; len >= 64 ; len -= 64 ) { for ( ; len >= BITS_PER_WORD ; len -= BITS_PER_WORD ) {
current_block = (uint64_t*) (b + ((from+count)/8)); if (BIT_WORD(b, from + count) == word_match) {
if (*current_block == ( first_value ? UINT64_MAX : 0 ) ) { count += BITS_PER_WORD;
count += 64;
} else { } else {
break; break;
} }
} }
for ( ; len > 0; len-- ) { for ( ; len > 0; len-- ) {
if ( bit_has_value(b, from+count, first_value) ) { if ( bit_has_value(b, from + count, first_value) ) {
count++; count++;
} }
} }
@@ -116,6 +136,7 @@ enum bitset_stream_events {
BITSET_STREAM_ON = 2, BITSET_STREAM_ON = 2,
BITSET_STREAM_OFF = 3 BITSET_STREAM_OFF = 3
}; };
#define BITSET_STREAM_EVENTS_ENUM_SIZE 4
struct bitset_stream_entry { struct bitset_stream_entry {
enum bitset_stream_events event; enum bitset_stream_events event;
@@ -138,6 +159,7 @@ struct bitset_stream {
pthread_mutex_t mutex; pthread_mutex_t mutex;
pthread_cond_t cond_not_full; pthread_cond_t cond_not_full;
pthread_cond_t cond_not_empty; pthread_cond_t cond_not_empty;
uint64_t queued_bytes[BITSET_STREAM_EVENTS_ENUM_SIZE];
}; };
@@ -152,7 +174,7 @@ struct bitset {
int resolution; int resolution;
struct bitset_stream *stream; struct bitset_stream *stream;
int stream_enabled; int stream_enabled;
char bits[]; bitfield_word_t bits[];
}; };
/** Allocate a bitset for a file of the given size, and chunks of the /** Allocate a bitset for a file of the given size, and chunks of the
@@ -160,9 +182,12 @@ struct bitset {
*/ */
static inline struct bitset *bitset_alloc( uint64_t size, int resolution ) static inline struct bitset *bitset_alloc( uint64_t size, int resolution )
{ {
struct bitset *bitset = xmalloc( // calculate a size to allocate that is a multiple of the size of the
sizeof( struct bitset ) + ( size + resolution - 1 ) / resolution // bitfield word
); size_t bitfield_size =
BIT_WORDS_FOR_SIZE((( size + resolution - 1 ) / resolution)) * sizeof( bitfield_word_t );
struct bitset *bitset = xmalloc(sizeof( struct bitset ) + ( bitfield_size / 8 ) );
bitset->size = size; bitset->size = size;
bitset->resolution = resolution; bitset->resolution = resolution;
/* don't actually need to call pthread_mutex_destroy '*/ /* don't actually need to call pthread_mutex_destroy '*/
@@ -217,13 +242,14 @@ static inline void bitset_stream_enqueue(
stream->entries[stream->in].event = event; stream->entries[stream->in].event = event;
stream->entries[stream->in].from = from; stream->entries[stream->in].from = from;
stream->entries[stream->in].len = len; stream->entries[stream->in].len = len;
stream->queued_bytes[event] += len;
stream->size++; stream->size++;
stream->in++; stream->in++;
stream->in %= BITSET_STREAM_SIZE; stream->in %= BITSET_STREAM_SIZE;
pthread_mutex_unlock( & stream->mutex ); pthread_mutex_unlock( & stream->mutex );
pthread_cond_broadcast( &stream->cond_not_empty ); pthread_cond_signal( &stream->cond_not_empty );
return; return;
} }
@@ -234,6 +260,7 @@ static inline void bitset_stream_dequeue(
) )
{ {
struct bitset_stream * stream = set->stream; struct bitset_stream * stream = set->stream;
struct bitset_stream_entry * dequeued;
pthread_mutex_lock( &stream->mutex ); pthread_mutex_lock( &stream->mutex );
@@ -241,18 +268,21 @@ static inline void bitset_stream_dequeue(
pthread_cond_wait( &stream->cond_not_empty, &stream->mutex ); pthread_cond_wait( &stream->cond_not_empty, &stream->mutex );
} }
dequeued = &stream->entries[stream->out];
if ( out != NULL ) { if ( out != NULL ) {
out->event = stream->entries[stream->out].event; out->event = dequeued->event;
out->from = stream->entries[stream->out].from; out->from = dequeued->from;
out->len = stream->entries[stream->out].len; out->len = dequeued->len;
} }
stream->queued_bytes[dequeued->event] -= dequeued->len;
stream->size--; stream->size--;
stream->out++; stream->out++;
stream->out %= BITSET_STREAM_SIZE; stream->out %= BITSET_STREAM_SIZE;
pthread_mutex_unlock( &stream->mutex ); pthread_mutex_unlock( &stream->mutex );
pthread_cond_broadcast( &stream->cond_not_full ); pthread_cond_signal( &stream->cond_not_full );
return; return;
} }
@@ -273,17 +303,10 @@ static inline uint64_t bitset_stream_queued_bytes(
enum bitset_stream_events event enum bitset_stream_events event
) )
{ {
uint64_t total = 0; uint64_t total;
int i;
pthread_mutex_lock( &set->stream->mutex ); pthread_mutex_lock( &set->stream->mutex );
total = set->stream->queued_bytes[event];
for ( i = set->stream->out; i < set->stream->in ; i++ ) {
if ( set->stream->entries[i].event == event ) {
total += set->stream->entries[i].len;
}
}
pthread_mutex_unlock( &set->stream->mutex ); pthread_mutex_unlock( &set->stream->mutex );
return total; return total;

View File

@@ -15,6 +15,20 @@
#include <sys/stat.h> #include <sys/stat.h>
#include <fcntl.h> #include <fcntl.h>
// When this signal is invoked, we call shutdown() on the client fd, which
// results in the thread being wound up
void client_killswitch_hit(int signal __attribute__ ((unused)), siginfo_t *info, void *ptr __attribute__ ((unused)))
{
int fd = info->si_value.sival_int;
warn( "Killswitch for fd %i activated, calling shutdown on socket", fd );
FATAL_IF(
-1 == shutdown( fd, SHUT_RDWR ),
SHOW_ERRNO( "Failed to shutdown() the socket, killing the server" )
);
}
struct client *client_create( struct server *serve, int socket ) struct client *client_create( struct server *serve, int socket )
{ {
NULLCHECK( serve ); NULLCHECK( serve );
@@ -25,6 +39,13 @@ struct client *client_create( struct server *serve, int socket )
.sigev_signo = CLIENT_KILLSWITCH_SIGNAL .sigev_signo = CLIENT_KILLSWITCH_SIGNAL
}; };
/*
* Our killswitch closes this socket, forcing read() and write() calls
* blocked on it to return with an error. The thread then close()s the
* socket itself, avoiding races.
*/
evp.sigev_value.sival_int = socket;
c = xmalloc( sizeof( struct client ) ); c = xmalloc( sizeof( struct client ) );
c->stopped = 0; c->stopped = 0;
c->socket = socket; c->socket = socket;
@@ -105,7 +126,9 @@ void write_not_zeroes(struct client* client, uint64_t from, uint64_t len)
debug("(run adjusted to %d)", run); debug("(run adjusted to %d)", run);
} }
if (0) /* useful but expensive */ /*
// Useful but expensive
if (0)
{ {
uint64_t i; uint64_t i;
fprintf(stderr, "full map resolution=%d: ", map->resolution); fprintf(stderr, "full map resolution=%d: ", map->resolution);
@@ -118,6 +141,7 @@ void write_not_zeroes(struct client* client, uint64_t from, uint64_t len)
} }
fprintf(stderr, "\n"); fprintf(stderr, "\n");
} }
*/
#define DO_READ(dst, len) ERROR_IF_NEGATIVE( \ #define DO_READ(dst, len) ERROR_IF_NEGATIVE( \
readloop( \ readloop( \
@@ -199,36 +223,6 @@ int client_read_request( struct client * client , struct nbd_request *out_reques
NULLCHECK( out_request ); NULLCHECK( out_request );
struct nbd_request_raw request_raw; struct nbd_request_raw request_raw;
fd_set fds;
struct timeval * ptv = NULL;
int fd_count;
/* We want a timeout if this is an inbound migration, but not otherwise.
* This is compile-time selectable, as it will break mirror max_bps
*/
#ifdef HAS_LISTEN_TIMEOUT
struct timeval tv = {CLIENT_MAX_WAIT_SECS, 0};
if ( !server_is_in_control( client->serve ) ) {
ptv = &tv;
}
#endif
FD_ZERO(&fds);
FD_SET(client->socket, &fds);
self_pipe_fd_set( client->stop_signal, &fds );
fd_count = sock_try_select(FD_SETSIZE, &fds, NULL, NULL, ptv);
if ( fd_count == 0 ) {
/* This "can't ever happen" */
if ( NULL == ptv ) { fatal( "No FDs selected, and no timeout!" ); }
else { error("Timed out waiting for I/O"); }
}
else if ( fd_count < 0 ) { fatal( "Select failed" ); }
if ( self_pipe_fd_isset( client->stop_signal, &fds ) ){
debug("Client received stop signal.");
return 0;
}
if (fd_read_request(client->socket, &request_raw) == -1) { if (fd_read_request(client->socket, &request_raw) == -1) {
*disconnected = 1; *disconnected = 1;
@@ -255,21 +249,20 @@ int client_read_request( struct client * client , struct nbd_request *out_reques
} }
nbd_r2h_request( &request_raw, out_request ); nbd_r2h_request( &request_raw, out_request );
return 1; return 1;
} }
int fd_write_reply( int fd, char *handle, int error ) int fd_write_reply( int fd, uint64_t handle, int error )
{ {
struct nbd_reply reply; struct nbd_reply reply;
struct nbd_reply_raw reply_raw; struct nbd_reply_raw reply_raw;
reply.magic = REPLY_MAGIC; reply.magic = REPLY_MAGIC;
reply.error = error; reply.error = error;
memcpy( reply.handle, handle, 8 ); reply.handle.w = handle;
nbd_h2r_reply( &reply, &reply_raw ); nbd_h2r_reply( &reply, &reply_raw );
debug( "Replying with %s, %d", handle, error ); debug( "Replying with handle=0x%08X, error=%"PRIu32, handle, error );
if( -1 == writeloop( fd, &reply_raw, sizeof( reply_raw ) ) ) { if( -1 == writeloop( fd, &reply_raw, sizeof( reply_raw ) ) ) {
switch( errno ) { switch( errno ) {
@@ -298,7 +291,7 @@ int fd_write_reply( int fd, char *handle, int error )
*/ */
int client_write_reply( struct client * client, struct nbd_request *request, int error ) int client_write_reply( struct client * client, struct nbd_request *request, int error )
{ {
return fd_write_reply( client->socket, request->handle, error); return fd_write_reply( client->socket, request->handle.w, error);
} }
@@ -307,7 +300,7 @@ void client_write_init( struct client * client, uint64_t size )
struct nbd_init init = {{0}}; struct nbd_init init = {{0}};
struct nbd_init_raw init_raw = {{0}}; struct nbd_init_raw init_raw = {{0}};
memcpy( init.passwd, INIT_PASSWD, sizeof( INIT_PASSWD ) ); memcpy( init.passwd, INIT_PASSWD, sizeof( init.passwd ) );
init.magic = INIT_MAGIC; init.magic = INIT_MAGIC;
init.size = size; init.size = size;
memset( init.reserved, 0, 128 ); memset( init.reserved, 0, 128 );
@@ -379,15 +372,15 @@ int client_request_needs_reply( struct client * client,
* forever. * forever.
*/ */
if (request.magic != REQUEST_MAGIC) { if (request.magic != REQUEST_MAGIC) {
warn("Bad magic 0x%08x from client", request.magic); warn("Bad magic 0x%08X from client", request.magic);
client_write_reply( client, &request, EBADMSG ); client_write_reply( client, &request, EBADMSG );
client->disconnect = 1; // no need to flush client->disconnect = 1; // no need to flush
return 0; return 0;
} }
debug( debug(
"request type=%"PRIu32", from=%"PRIu64", len=%"PRIu32, "request type=%"PRIu32", from=%"PRIu64", len=%"PRIu32", handle=0x%08X",
request.type, request.from, request.len request.type, request.from, request.len, request.handle
); );
/* check it's not out of range */ /* check it's not out of range */
@@ -416,7 +409,7 @@ int client_request_needs_reply( struct client * client,
return 0; return 0;
default: default:
fatal("Unknown request %08x", request.type); fatal("Unknown request 0x%08X", request.type);
} }
return 1; return 1;
} }
@@ -427,7 +420,8 @@ void client_reply_to_read( struct client* client, struct nbd_request request )
off64_t offset; off64_t offset;
debug("request read %ld+%d", request.from, request.len); debug("request read %ld+%d", request.from, request.len);
client_write_reply( client, &request, 0); sock_set_tcp_cork( client->socket, 1 );
client_write_reply( client, &request, 0 );
offset = request.from; offset = request.from;
@@ -443,12 +437,14 @@ void client_reply_to_read( struct client* client, struct nbd_request request )
"sendfile failed from=%ld, len=%d", "sendfile failed from=%ld, len=%d",
offset, offset,
request.len); request.len);
sock_set_tcp_cork( client->socket, 0 );
} }
void client_reply_to_write( struct client* client, struct nbd_request request ) void client_reply_to_write( struct client* client, struct nbd_request request )
{ {
debug("request write %ld+%d", request.from, request.len); debug("request write from=%"PRIu64", len=%"PRIu32", handle=0x%08X", request.from, request.len, request.handle);
if (client->serve->allocation_map_built) { if (client->serve->allocation_map_built) {
write_not_zeroes( client, request.from, request.len ); write_not_zeroes( client, request.from, request.len );
} }
@@ -553,35 +549,79 @@ int client_serve_request(struct client* client)
struct nbd_request request = {0}; struct nbd_request request = {0};
int stop = 1; int stop = 1;
int disconnected = 0; int disconnected = 0;
fd_set rfds, efds;
int fd_count;
if ( !client_read_request( client, &request, &disconnected ) ) { return stop; } /* wait until there are some bytes on the fd before committing to reads
if ( disconnected ) { return stop; } * FIXME: this whole scheme is broken because we're using blocking reads.
if ( !client_request_needs_reply( client, request ) ) { * read() can block directly after a select anyway, and it's possible that,
* without the killswitch, we'd hang forever. With the killswitch, we just
* hang for "a while". The Right Thing to do is to rewrite client.c to be
* non-blocking.
*/
FD_ZERO( &rfds );
FD_SET( client->socket, &rfds );
self_pipe_fd_set( client->stop_signal, &rfds );
FD_ZERO( &efds );
FD_SET( client->socket, &efds );
fd_count = sock_try_select( FD_SETSIZE, &rfds, NULL, &efds, NULL );
if ( fd_count == 0 ) {
/* This "can't ever happen" */
fatal( "No FDs selected, and no timeout!" );
}
else if ( fd_count < 0 ) { fatal( "Select failed" ); }
if ( self_pipe_fd_isset( client->stop_signal, &rfds ) ){
debug("Client received stop signal.");
return 1; // Don't try to serve more requests
}
if ( FD_ISSET( client->socket, &efds ) ) {
debug( "Client connection closed" );
return 1;
}
/* We arm / disarm around the whole request cycle. The reason for this is
* that the remote peer could uncleanly die at any point; if we're stuck on
* a blocking read(), then that will hang for (almost) forever. This is bad
* in general, makes the server respond only to kill -9, and breaks
* outward mirroring in a most unpleasant way.
*
* Don't forget to disarm before exiting, no matter what!
*
* The replication is simple: open a connection to the flexnbd server, write
* a single byte, and then wait.
*
*/
client_arm_killswitch( client );
if ( !client_read_request( client, &request, &disconnected ) ) {
client_disarm_killswitch( client );
return stop;
}
if ( disconnected ) {
client_disarm_killswitch( client );
return stop;
}
if ( !client_request_needs_reply( client, request ) ) {
client_disarm_killswitch( client );
return client->disconnect; return client->disconnect;
} }
{ {
if ( !server_is_closed( client->serve ) ) { if ( !server_is_closed( client->serve ) ) {
/* We arm / disarm around client_reply() to catch cases where the
* remote peer sends part of a write request data before dying,
* and cases where we send part of read reply data before they die.
*
* That last is theoretical right now, but could break us in the
* same way as a half-write (which causes us to sit in read forever)
*
* We only arm/disarm inside the server io lock because it's common
* during migrations for us to be hanging on that mutex for quite
* a while while the final pass happens - it's held for the entire
* time.
*/
client_arm_killswitch( client );
client_reply( client, request ); client_reply( client, request );
client_disarm_killswitch( client );
stop = 0; stop = 0;
} }
} }
client_disarm_killswitch( client );
return stop; return stop;
} }
@@ -596,6 +636,9 @@ void client_cleanup(struct client* client,
{ {
info("client cleanup for client %p", client); info("client cleanup for client %p", client);
/* If the thread hits an error, we need to ensure this is off */
client_disarm_killswitch( client );
if (client->socket) { if (client->socket) {
FATAL_IF_NEGATIVE( close(client->socket), FATAL_IF_NEGATIVE( close(client->socket),
"Error closing client socket %d", "Error closing client socket %d",

View File

@@ -4,18 +4,6 @@
#include <signal.h> #include <signal.h>
#include <time.h> #include <time.h>
#ifdef HAS_LISTEN_TIMEOUT
/** CLIENT_MAX_WAIT_SECS
* This is the length of time an inbound migration will wait for a fresh
* write before assuming the source has Gone Away. Note: it is *not*
* the time from one write to the next, it is the gap between the end of
* one write and the start of the next.
*/
#define CLIENT_MAX_WAIT_SECS 5
#endif
/** CLIENT_HANDLER_TIMEOUT /** CLIENT_HANDLER_TIMEOUT
* This is the length of time (in seconds) any request can be outstanding for. * This is the length of time (in seconds) any request can be outstanding for.
* If we spend longer than this in a request, the whole server is killed. * If we spend longer than this in a request, the whole server is killed.
@@ -24,8 +12,7 @@
/** CLIENT_KILLSWITCH_SIGNAL /** CLIENT_KILLSWITCH_SIGNAL
* The signal number we use to kill the server when *any* killswitch timer * The signal number we use to kill the server when *any* killswitch timer
* fires. We don't actually need to install a signal handler for it, the default * fires. The handler gets the fd of the client socket to work with.
* behaviour is perfectly fine.
*/ */
#define CLIENT_KILLSWITCH_SIGNAL ( SIGRTMIN + 1 ) #define CLIENT_KILLSWITCH_SIGNAL ( SIGRTMIN + 1 )
@@ -58,6 +45,7 @@ struct client {
}; };
void client_killswitch_hit(int signal, siginfo_t *info, void *ptr);
void* client_serve(void* client_uncast); void* client_serve(void* client_uncast);
struct client * client_create( struct server * serve, int socket ); struct client * client_create( struct server * serve, int socket );

View File

@@ -101,12 +101,24 @@ struct flexnbd * flexnbd_create_serving(
max_nbd_clients, max_nbd_clients,
use_killswitch, use_killswitch,
1); 1);
flexnbd_create_shared( flexnbd, flexnbd_create_shared( flexnbd, s_ctrl_sock );
s_ctrl_sock );
// Beats installing one handler per client instance
if ( use_killswitch ) {
struct sigaction act = {
.sa_sigaction = client_killswitch_hit,
.sa_flags = SA_RESTART | SA_SIGINFO
};
FATAL_UNLESS(
0 == sigaction( CLIENT_KILLSWITCH_SIGNAL, &act, NULL ),
"Installing client killswitch signal failed"
);
}
return flexnbd; return flexnbd;
} }
struct flexnbd * flexnbd_create_listening( struct flexnbd * flexnbd_create_listening(
char* s_ip_address, char* s_ip_address,
char* s_port, char* s_port,
@@ -127,6 +139,10 @@ struct flexnbd * flexnbd_create_listening(
s_acl_entries, s_acl_entries,
1, 0, 0); 1, 0, 0);
flexnbd_create_shared( flexnbd, s_ctrl_sock ); flexnbd_create_shared( flexnbd, s_ctrl_sock );
// listen can't use killswitch, as mirror may pause on sending things
// for a very long time.
return flexnbd; return flexnbd;
} }

View File

@@ -5,6 +5,7 @@
#include "mirror.h" #include "mirror.h"
#include "serve.h" #include "serve.h"
#include "proxy.h" #include "proxy.h"
#include "client.h"
#include "self_pipe.h" #include "self_pipe.h"
#include "mbox.h" #include "mbox.h"
#include "control.h" #include "control.h"

View File

@@ -70,6 +70,7 @@ struct mirror_ctrl {
/* libev stuff */ /* libev stuff */
struct ev_loop *ev_loop; struct ev_loop *ev_loop;
ev_timer begin_watcher;
ev_io read_watcher; ev_io read_watcher;
ev_io write_watcher; ev_io write_watcher;
ev_timer timeout_watcher; ev_timer timeout_watcher;
@@ -138,7 +139,7 @@ enum mirror_state mirror_get_state( struct mirror * mirror )
void mirror_init( struct mirror * mirror, const char * filename ) void mirror_init( struct mirror * mirror, const char * filename )
{ {
int map_fd; int map_fd;
off64_t size; uint64_t size;
NULLCHECK( mirror ); NULLCHECK( mirror );
NULLCHECK( filename ); NULLCHECK( filename );
@@ -213,18 +214,6 @@ void mirror_destroy( struct mirror *mirror )
/** The mirror code will split NBD writes, making them this long as a maximum */ /** The mirror code will split NBD writes, making them this long as a maximum */
static const int mirror_longest_write = 8<<20; static const int mirror_longest_write = 8<<20;
/** If, during a mirror pass, we have sent this number of bytes or fewer, we
* go to freeze the I/O and finish it off. This is just a guess.
*/
static const unsigned int mirror_last_pass_after_bytes_written = 100<<20;
/** The largest number of full passes we'll do - the last one will always
* cause the I/O to freeze, however many bytes are left to copy.
*/
static const int mirror_maximum_passes = 7;
#define mirror_last_pass (mirror_maximum_passes - 1)
/* This must not be called if there's any chance of further I/O. Methods to /* This must not be called if there's any chance of further I/O. Methods to
* ensure this include: * ensure this include:
* - Ensure image size is 0 * - Ensure image size is 0
@@ -281,7 +270,7 @@ void mirror_cleanup( struct server * serve,
} }
int mirror_connect( struct mirror * mirror, off64_t local_size ) int mirror_connect( struct mirror * mirror, uint64_t local_size )
{ {
struct sockaddr * connect_from = NULL; struct sockaddr * connect_from = NULL;
int connected = 0; int connected = 0;
@@ -303,7 +292,7 @@ int mirror_connect( struct mirror * mirror, off64_t local_size )
"Select failed." ); "Select failed." );
if( FD_ISSET( mirror->client, &fds ) ){ if( FD_ISSET( mirror->client, &fds ) ){
off64_t remote_size; uint64_t remote_size;
if ( socket_nbd_read_hello( mirror->client, &remote_size ) ) { if ( socket_nbd_read_hello( mirror->client, &remote_size ) ) {
if( remote_size == local_size ){ if( remote_size == local_size ){
connected = 1; connected = 1;
@@ -347,6 +336,19 @@ int mirror_should_quit( struct mirror * mirror )
} }
} }
/* Bandwidth limiting - we hang around if bps is too high, unless we need to
* empty out the bitset stream a bit */
int mirror_should_wait( struct mirror_ctrl *ctrl )
{
int bps_over = server_mirror_bps( ctrl->serve ) >
ctrl->serve->mirror->max_bytes_per_second;
int stream_full = bitset_stream_size( ctrl->serve->allocation_map ) >
( BITSET_STREAM_SIZE / 2 );
return bps_over && !stream_full;
}
/* /*
* If there's an event in the bitset stream of the serve allocation map, we * If there's an event in the bitset stream of the serve allocation map, we
* use it to construct the next transfer request, covering precisely the area * use it to construct the next transfer request, covering precisely the area
@@ -369,7 +371,7 @@ int mirror_setup_next_xfer( struct mirror_ctrl *ctrl )
* full, and stop when it's a quarter full. This stops a busy client from * full, and stop when it's a quarter full. This stops a busy client from
* stalling a migration forever. FIXME: made-up numbers. * stalling a migration forever. FIXME: made-up numbers.
*/ */
if ( bitset_stream_size( serve->allocation_map ) > BITSET_STREAM_SIZE / 2 ) { if ( mirror->offset < serve->size && bitset_stream_size( serve->allocation_map ) > BITSET_STREAM_SIZE / 2 ) {
ctrl->clear_events = 1; ctrl->clear_events = 1;
} }
@@ -410,7 +412,7 @@ int mirror_setup_next_xfer( struct mirror_ctrl *ctrl )
struct nbd_request req = { struct nbd_request req = {
.magic = REQUEST_MAGIC, .magic = REQUEST_MAGIC,
.type = REQUEST_WRITE, .type = REQUEST_WRITE,
.handle = ".MIRROR.", .handle.b = ".MIRROR.",
.from = current, .from = current,
.len = run .len = run
}; };
@@ -425,24 +427,6 @@ int mirror_setup_next_xfer( struct mirror_ctrl *ctrl )
return 1; return 1;
} }
uint64_t mirror_current_bps( struct mirror * mirror )
{
uint64_t duration_ms = monotonic_time_ms() - mirror->migration_started;
return mirror->all_dirty / ( ( duration_ms / 1000 ) + 1 );
}
int mirror_exceeds_max_bps( struct mirror * mirror )
{
uint64_t mig_speed = mirror_current_bps( mirror );
debug( "current_bps: %"PRIu64"; max_bps: %"PRIu64, mig_speed, mirror->max_bytes_per_second );
if ( mig_speed > mirror->max_bytes_per_second ) {
return 1;
}
return 0;
}
// ONLY CALL THIS AFTER CLOSING CLIENTS // ONLY CALL THIS AFTER CLOSING CLIENTS
void mirror_complete( struct server *serve ) void mirror_complete( struct server *serve )
{ {
@@ -478,6 +462,12 @@ static void mirror_write_cb( struct ev_loop *loop, ev_io *w, int revents )
debug( "Mirror write callback invoked with events %d. fd: %i", revents, ctrl->mirror->client ); debug( "Mirror write callback invoked with events %d. fd: %i", revents, ctrl->mirror->client );
/* FIXME: We can end up corking multiple times in unusual circumstances; this
* is annoying, but harmless */
if ( xfer->written == 0 ) {
sock_set_tcp_cork( ctrl->mirror->client, 1 );
}
if ( xfer->written < hdr_size ) { if ( xfer->written < hdr_size ) {
data_loc = ( (char*) &xfer->hdr.req_raw ) + ctrl->xfer.written; data_loc = ( (char*) &xfer->hdr.req_raw ) + ctrl->xfer.written;
to_write = hdr_size - xfer->written; to_write = hdr_size - xfer->written;
@@ -486,7 +476,7 @@ static void mirror_write_cb( struct ev_loop *loop, ev_io *w, int revents )
to_write = xfer->len - ( ctrl->xfer.written - hdr_size ); to_write = xfer->len - ( ctrl->xfer.written - hdr_size );
} }
// Actually read some bytes // Actually write some bytes
if ( ( count = write( ctrl->mirror->client, data_loc, to_write ) ) < 0 ) { if ( ( count = write( ctrl->mirror->client, data_loc, to_write ) ) < 0 ) {
if ( errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR ) { if ( errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR ) {
warn( SHOW_ERRNO( "Couldn't write to listener" ) ); warn( SHOW_ERRNO( "Couldn't write to listener" ) );
@@ -496,13 +486,16 @@ static void mirror_write_cb( struct ev_loop *loop, ev_io *w, int revents )
} }
debug( "Wrote %"PRIu64" bytes", count ); debug( "Wrote %"PRIu64" bytes", count );
debug( "to_write was %"PRIu64", xfer->written was %"PRIu64, to_write, xfer->written ); debug( "to_write was %"PRIu64", xfer->written was %"PRIu64, to_write, xfer->written );
ctrl->xfer.written += count;
// We wrote some bytes, so reset the timer // We wrote some bytes, so reset the timer and keep track for the next pass
ev_timer_again( ctrl->ev_loop, &ctrl->timeout_watcher ); if ( count > 0 ) {
ctrl->xfer.written += count;
ev_timer_again( ctrl->ev_loop, &ctrl->timeout_watcher );
}
// All bytes written, so now we need to read the NBD reply back. // All bytes written, so now we need to read the NBD reply back.
if ( ctrl->xfer.written == ctrl->xfer.len + hdr_size ) { if ( ctrl->xfer.written == ctrl->xfer.len + hdr_size ) {
sock_set_tcp_cork( ctrl->mirror->client, 0 ) ;
ev_io_start( loop, &ctrl->read_watcher ); ev_io_start( loop, &ctrl->read_watcher );
ev_io_stop( loop, &ctrl->write_watcher ); ev_io_stop( loop, &ctrl->write_watcher );
} }
@@ -575,7 +568,7 @@ static void mirror_read_cb( struct ev_loop *loop, ev_io *w, int revents )
return; return;
} }
if ( memcmp( ".MIRROR.", &rsp.handle[0], 8 ) != 0 ) { if ( memcmp( ".MIRROR.", rsp.handle.b, 8 ) != 0 ) {
warn( "Bad handle returned from listener" ); warn( "Bad handle returned from listener" );
ev_break( loop, EVBREAK_ONE ); ev_break( loop, EVBREAK_ONE );
return; return;
@@ -584,10 +577,17 @@ static void mirror_read_cb( struct ev_loop *loop, ev_io *w, int revents )
/* transfer was completed, so now we need to either set up the next /* transfer was completed, so now we need to either set up the next
* transfer of this pass, set up the first transfer of the next pass, or * transfer of this pass, set up the first transfer of the next pass, or
* complete the migration */ * complete the migration */
m->all_dirty += xfer->len;
xfer->read = 0; xfer->read = 0;
xfer->written = 0; xfer->written = 0;
/* We don't account for bytes written in this mode, to stop high-throughput
* discs getting stuck in "drain the event queue!" mode forever
*/
if ( !ctrl->clear_events ) {
m->all_dirty += xfer->len;
}
/* This next bit could take a little while, which is fine */ /* This next bit could take a little while, which is fine */
ev_timer_stop( ctrl->ev_loop, &ctrl->timeout_watcher ); ev_timer_stop( ctrl->ev_loop, &ctrl->timeout_watcher );
@@ -601,17 +601,15 @@ static void mirror_read_cb( struct ev_loop *loop, ev_io *w, int revents )
int next_xfer = mirror_setup_next_xfer( ctrl ); int next_xfer = mirror_setup_next_xfer( ctrl );
debug( "next_xfer: %d", next_xfer ); debug( "next_xfer: %d", next_xfer );
/* Regardless of time estimates, if there's no waiting transfer, we can /* Regardless of time estimates, if there's no waiting transfer, we can start closing clients down. */
* */ if ( !ctrl->clients_closed && ( !next_xfer || server_mirror_eta( ctrl->serve ) < MS_CONVERGE_TIME_SECS ) ) {
if ( !ctrl->clients_closed && ( !next_xfer || server_mirror_eta( ctrl->serve ) < 60 ) ) {
info( "Closing clients to allow mirroring to converge" ); info( "Closing clients to allow mirroring to converge" );
server_forbid_new_clients( ctrl->serve ); server_forbid_new_clients( ctrl->serve );
server_close_clients( ctrl->serve ); server_close_clients( ctrl->serve );
server_join_clients( ctrl->serve ); server_join_clients( ctrl->serve );
ctrl->clients_closed = 1; ctrl->clients_closed = 1;
/* One more try - a new event may have been pushed since our last check /* One more try - a new event may have been pushed since our last check */
*/
if ( !next_xfer ) { if ( !next_xfer ) {
next_xfer = mirror_setup_next_xfer( ctrl ); next_xfer = mirror_setup_next_xfer( ctrl );
} }
@@ -630,7 +628,7 @@ static void mirror_read_cb( struct ev_loop *loop, ev_io *w, int revents )
/* FIXME: Should we ignore the bwlimit after server_close_clients has been called? */ /* FIXME: Should we ignore the bwlimit after server_close_clients has been called? */
if ( mirror_exceeds_max_bps( m ) ) { if ( mirror_should_wait( ctrl ) ) {
/* We're over the bandwidth limit, so don't move onto the next transfer /* We're over the bandwidth limit, so don't move onto the next transfer
* yet. Our limit_watcher will move us on once we're OK. timeout_watcher * yet. Our limit_watcher will move us on once we're OK. timeout_watcher
* was disabled further up, so don't need to stop it here too */ * was disabled further up, so don't need to stop it here too */
@@ -645,7 +643,7 @@ static void mirror_read_cb( struct ev_loop *loop, ev_io *w, int revents )
return; return;
} }
void mirror_timeout_cb( struct ev_loop *loop, ev_timer *w __attribute__((unused)), int revents ) static void mirror_timeout_cb( struct ev_loop *loop, ev_timer *w __attribute__((unused)), int revents )
{ {
if ( !(revents & EV_TIMER ) ) { if ( !(revents & EV_TIMER ) ) {
warn( "Mirror timeout called but no timer event signalled" ); warn( "Mirror timeout called but no timer event signalled" );
@@ -657,7 +655,7 @@ void mirror_timeout_cb( struct ev_loop *loop, ev_timer *w __attribute__((unused)
return; return;
} }
void mirror_abandon_cb( struct ev_loop *loop, ev_io *w, int revents ) static void mirror_abandon_cb( struct ev_loop *loop, ev_io *w, int revents )
{ {
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data; struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl ); NULLCHECK( ctrl );
@@ -674,7 +672,8 @@ void mirror_abandon_cb( struct ev_loop *loop, ev_io *w, int revents )
return; return;
} }
void mirror_limit_cb( struct ev_loop *loop, ev_timer *w, int revents )
static void mirror_limit_cb( struct ev_loop *loop, ev_timer *w, int revents )
{ {
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data; struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl ); NULLCHECK( ctrl );
@@ -684,7 +683,7 @@ void mirror_limit_cb( struct ev_loop *loop, ev_timer *w, int revents )
return; return;
} }
if ( mirror_exceeds_max_bps( ctrl->mirror ) ) { if ( mirror_should_wait( ctrl ) ) {
debug( "max_bps exceeded, waiting", ctrl->mirror->max_bytes_per_second ); debug( "max_bps exceeded, waiting", ctrl->mirror->max_bytes_per_second );
ev_timer_again( loop, w ); ev_timer_again( loop, w );
} else { } else {
@@ -698,6 +697,37 @@ void mirror_limit_cb( struct ev_loop *loop, ev_timer *w, int revents )
return; return;
} }
/* We use this to periodically check whether the allocation map has built, and
* if it has, start migrating. If it's not finished, then enabling the bitset
* stream does not go well for us.
*/
static void mirror_begin_cb( struct ev_loop *loop, ev_timer *w, int revents )
{
struct mirror_ctrl* ctrl = (struct mirror_ctrl*) w->data;
NULLCHECK( ctrl );
if ( !(revents & EV_TIMER ) ) {
warn( "Mirror limit callback executed but no timer event signalled" );
return;
}
if ( ctrl->serve->allocation_map_built || ctrl->serve->allocation_map_not_built ) {
info( "allocation map builder is finished, beginning migration" );
ev_timer_stop( loop, w );
/* Start by writing xfer 0 to the listener */
ev_io_start( loop, &ctrl->write_watcher );
/* We want to timeout during the first write as well as subsequent ones */
ev_timer_again( loop, &ctrl->timeout_watcher );
/* We're now interested in events */
bitset_enable_stream( ctrl->serve->allocation_map );
} else {
/* not done yet, so wait another second */
ev_timer_again( loop, w );
}
return;
}
void mirror_run( struct server *serve ) void mirror_run( struct server *serve )
{ {
NULLCHECK( serve ); NULLCHECK( serve );
@@ -727,7 +757,12 @@ void mirror_run( struct server *serve )
ctrl.ev_loop = EV_DEFAULT; ctrl.ev_loop = EV_DEFAULT;
/* gcc warns on -O2. clang is fine. Seems to be the fault of ev.h */ /* gcc warns with -Wstrict-aliasing on -O2. clang doesn't
* implement this warning. Seems to be the fault of ev.h */
ev_init( &ctrl.begin_watcher, mirror_begin_cb );
ctrl.begin_watcher.repeat = 1.0; // We check bps every second. seems sane.
ctrl.begin_watcher.data = (void*) &ctrl;
ev_io_init( &ctrl.read_watcher, mirror_read_cb, m->client, EV_READ ); ev_io_init( &ctrl.read_watcher, mirror_read_cb, m->client, EV_READ );
ctrl.read_watcher.data = (void*) &ctrl; ctrl.read_watcher.data = (void*) &ctrl;
@@ -735,7 +770,22 @@ void mirror_run( struct server *serve )
ctrl.write_watcher.data = (void*) &ctrl; ctrl.write_watcher.data = (void*) &ctrl;
ev_init( &ctrl.timeout_watcher, mirror_timeout_cb ); ev_init( &ctrl.timeout_watcher, mirror_timeout_cb );
ctrl.timeout_watcher.repeat = MS_REQUEST_LIMIT_SECS_F ;
char * env_request_limit = getenv( "FLEXNBD_MS_REQUEST_LIMIT_SECS" );
double timeout_limit = MS_REQUEST_LIMIT_SECS_F;
if ( NULL != env_request_limit ) {
char *endptr = NULL;
errno = 0;
double limit = strtod( env_request_limit, &endptr );
warn( SHOW_ERRNO( "Got %f from strtod", limit ) );
if ( errno == 0 ) {
timeout_limit = limit;
}
}
ctrl.timeout_watcher.repeat = timeout_limit;
ev_init( &ctrl.limit_watcher, mirror_limit_cb ); ev_init( &ctrl.limit_watcher, mirror_limit_cb );
ctrl.limit_watcher.repeat = 1.0; // We check bps every second. seems sane. ctrl.limit_watcher.repeat = 1.0; // We check bps every second. seems sane.
@@ -751,19 +801,23 @@ void mirror_run( struct server *serve )
"Couldn't find first transfer for mirror!" "Couldn't find first transfer for mirror!"
); );
/* Start by writing xfer 0 to the listener */
ev_io_start( ctrl.ev_loop, &ctrl.write_watcher );
/* We want to timeout during the first write as well as subsequent ones */ if ( serve->allocation_map_built ) {
ev_timer_again( ctrl.ev_loop, &ctrl.timeout_watcher ); /* Start by writing xfer 0 to the listener */
ev_io_start( ctrl.ev_loop, &ctrl.write_watcher );
/* We want to timeout during the first write as well as subsequent ones */
ev_timer_again( ctrl.ev_loop, &ctrl.timeout_watcher );
bitset_enable_stream( serve->allocation_map );
} else {
debug( "Waiting for allocation map to be built" );
ev_timer_again( ctrl.ev_loop, &ctrl.begin_watcher );
}
/* Everything up to here is blocking. We switch to non-blocking so we /* Everything up to here is blocking. We switch to non-blocking so we
* can handle rate-limiting and weird error conditions better. TODO: We * can handle rate-limiting and weird error conditions better. TODO: We
* should expand the event loop upwards so we can do the same there too */ * should expand the event loop upwards so we can do the same there too */
sock_set_nonblock( m->client, 1 ); sock_set_nonblock( m->client, 1 );
bitset_enable_stream( serve->allocation_map );
info( "Entering event loop" ); info( "Entering event loop" );
ev_run( ctrl.ev_loop, 0 ); ev_run( ctrl.ev_loop, 0 );
info( "Exited event loop" ); info( "Exited event loop" );
@@ -784,12 +838,11 @@ void mirror_run( struct server *serve )
* call retries the migration from scratch. */ * call retries the migration from scratch. */
if ( m->commit_state != MS_DONE ) { if ( m->commit_state != MS_DONE ) {
error( "Event loop exited, but mirroring is not complete" );
/* mirror_reset will be called before a retry, so keeping hold of events /* mirror_reset will be called before a retry, so keeping hold of events
* between now and our next mirroring attempt is not useful * between now and our next mirroring attempt is not useful
*/ */
bitset_disable_stream( serve->allocation_map ); bitset_disable_stream( serve->allocation_map );
error( "Event loop exited, but mirroring is not complete" );
} }
return; return;
@@ -869,7 +922,7 @@ void* mirror_runner(void* serve_params_uncast)
* for us ). But if we've failed and are going to retry on the next run, we * for us ). But if we've failed and are going to retry on the next run, we
* must close this socket here to have any chance of it succeeding. * must close this socket here to have any chance of it succeeding.
*/ */
if ( !mirror->client < 0 ) { if ( !(mirror->client < 0) ) {
sock_try_close( mirror->client ); sock_try_close( mirror->client );
mirror->client = -1; mirror->client = -1;
} }
@@ -1016,4 +1069,3 @@ void * mirror_super_runner( void * serve_uncast )
return NULL; return NULL;
} }

View File

@@ -18,6 +18,18 @@ enum mirror_state;
*/ */
#define MS_CONNECT_TIME_SECS 60 #define MS_CONNECT_TIME_SECS 60
/* MS_MAX_DOWNTIME_SECS
* The length of time a migration must be estimated to have remaining for us to
* disconnect clients for convergence
*
* TODO: Make this configurable so refusing-to-converge clients can be manually
* fixed.
* TODO: Make this adaptive - 5 seconds is fine, as long as we can guarantee
* that all migrations will be able to converge in time. We'd add a new
* state between open and closed, where gradually-increasing latency is
* added to client requests to allow the mirror to be faster.
*/
#define MS_CONVERGE_TIME_SECS 5
/* MS_HELLO_TIME_SECS /* MS_HELLO_TIME_SECS
* The length of time the sender will wait for the NBD hello message * The length of time the sender will wait for the NBD hello message
@@ -38,9 +50,12 @@ enum mirror_state;
* request, this is the time between the end of the NBD request and the * request, this is the time between the end of the NBD request and the
* start of the NBD reply. For a write request, this is the time * start of the NBD reply. For a write request, this is the time
* between the end of the written data and the start of the NBD reply. * between the end of the written data and the start of the NBD reply.
* Can be overridden by the environment variable:
* FLEXNBD_MS_REQUEST_LIMIT_SECS
*/ */
#define MS_REQUEST_LIMIT_SECS 4
#define MS_REQUEST_LIMIT_SECS_F 4.0 #define MS_REQUEST_LIMIT_SECS 60
#define MS_REQUEST_LIMIT_SECS_F 60.0
enum mirror_finish_action { enum mirror_finish_action {
ACTION_EXIT, ACTION_EXIT,
@@ -122,7 +137,5 @@ struct mirror_super * mirror_super_create(
); );
void * mirror_super_runner( void * serve_uncast ); void * mirror_super_runner( void * serve_uncast );
uint64_t mirror_current_bps( struct mirror * mirror );
#endif #endif

View File

@@ -220,7 +220,6 @@ void read_serve_param( int c, char **ip_addr, char **ip_port, char **file, char
case 'h': case 'h':
fprintf(stdout, "%s\n", serve_help_text ); fprintf(stdout, "%s\n", serve_help_text );
exit( 0 ); exit( 0 );
break;
case 'l': case 'l':
*ip_addr = optarg; *ip_addr = optarg;
break; break;
@@ -263,7 +262,6 @@ void read_listen_param( int c,
case 'h': case 'h':
fprintf(stdout, "%s\n", listen_help_text ); fprintf(stdout, "%s\n", listen_help_text );
exit(0); exit(0);
break;
case 'l': case 'l':
*ip_addr = optarg; *ip_addr = optarg;
break; break;
@@ -297,7 +295,6 @@ void read_readwrite_param( int c, char **ip_addr, char **ip_port, char **bind_ad
case 'h': case 'h':
fprintf(stdout, "%s\n", err_text ); fprintf(stdout, "%s\n", err_text );
exit( 0 ); exit( 0 );
break;
case 'l': case 'l':
*ip_addr = optarg; *ip_addr = optarg;
break; break;
@@ -331,7 +328,6 @@ void read_sock_param( int c, char **sock, char *help_text )
case 'h': case 'h':
fprintf( stdout, "%s\n", help_text ); fprintf( stdout, "%s\n", help_text );
exit( 0 ); exit( 0 );
break;
case 's': case 's':
*sock = optarg; *sock = optarg;
break; break;
@@ -362,7 +358,6 @@ void read_mirror_speed_param(
case 'h': case 'h':
fprintf( stdout, "%s\n", mirror_speed_help_text ); fprintf( stdout, "%s\n", mirror_speed_help_text );
exit( 0 ); exit( 0 );
break;
case 's': case 's':
*sock = optarg; *sock = optarg;
break; break;
@@ -394,7 +389,6 @@ void read_mirror_param(
case 'h': case 'h':
fprintf( stdout, "%s\n", mirror_help_text ); fprintf( stdout, "%s\n", mirror_help_text );
exit( 0 ); exit( 0 );
break;
case 's': case 's':
*sock = optarg; *sock = optarg;
break; break;
@@ -428,7 +422,6 @@ void read_break_param( int c, char **sock )
case 'h': case 'h':
fprintf( stdout, "%s\n", break_help_text ); fprintf( stdout, "%s\n", break_help_text );
exit( 0 ); exit( 0 );
break;
case 's': case 's':
*sock = optarg; *sock = optarg;
break; break;
@@ -580,7 +573,10 @@ void params_readwrite(
parse_port( s_port, &out->connect_to.v4 ); parse_port( s_port, &out->connect_to.v4 );
out->from = atol(s_from); long signed_from = atol(s_from);
FATAL_IF_NEGATIVE( signed_from,
"Can't read from a negative offset %d.", signed_from);
out->from = signed_from;
if (write_not_read) { if (write_not_read) {
if (s_length_or_filename[0]-48 < 10) { if (s_length_or_filename[0]-48 < 10) {
@@ -592,9 +588,10 @@ void params_readwrite(
s_length_or_filename, O_RDONLY); s_length_or_filename, O_RDONLY);
FATAL_IF_NEGATIVE(out->data_fd, FATAL_IF_NEGATIVE(out->data_fd,
"Couldn't open %s", s_length_or_filename); "Couldn't open %s", s_length_or_filename);
out->len = lseek64(out->data_fd, 0, SEEK_END); off64_t signed_len = lseek64(out->data_fd, 0, SEEK_END);
FATAL_IF_NEGATIVE(out->len, FATAL_IF_NEGATIVE(signed_len,
"Couldn't find length of %s", s_length_or_filename); "Couldn't find length of %s", s_length_or_filename);
out->len = signed_len;
FATAL_IF_NEGATIVE( FATAL_IF_NEGATIVE(
lseek64(out->data_fd, 0, SEEK_SET), lseek64(out->data_fd, 0, SEEK_SET),
"Couldn't rewind %s", s_length_or_filename "Couldn't rewind %s", s_length_or_filename
@@ -787,7 +784,7 @@ int mode_break( int argc, char *argv[] )
if ( NULL == sock ){ if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" ); fprintf( stderr, "--sock is required.\n" );
exit_err( acl_help_text ); exit_err( break_help_text );
} }
do_remote_command( "break", sock, argc - optind, argv + optind ); do_remote_command( "break", sock, argc - optind, argv + optind );
@@ -808,7 +805,7 @@ int mode_status( int argc, char *argv[] )
if ( NULL == sock ){ if ( NULL == sock ){
fprintf( stderr, "--sock is required.\n" ); fprintf( stderr, "--sock is required.\n" );
exit_err( acl_help_text ); exit_err( status_help_text );
} }
do_remote_command( "status", sock, argc - optind, argv + optind ); do_remote_command( "status", sock, argc - optind, argv + optind );

View File

@@ -233,7 +233,6 @@ int tryjoin_client_thread( struct client_tbl_entry *entry, int (*joinfunc)(pthre
int was_closed = 0; int was_closed = 0;
void * status=NULL; void * status=NULL;
int join_errno;
if (entry->thread != 0) { if (entry->thread != 0) {
char s_client_address[128]; char s_client_address[128];
@@ -241,7 +240,7 @@ int tryjoin_client_thread( struct client_tbl_entry *entry, int (*joinfunc)(pthre
sockaddr_address_string( &entry->address.generic, &s_client_address[0], 128 ); sockaddr_address_string( &entry->address.generic, &s_client_address[0], 128 );
debug( "%s(%p,...)", joinfunc == pthread_join ? "joining" : "tryjoining", entry->thread ); debug( "%s(%p,...)", joinfunc == pthread_join ? "joining" : "tryjoining", entry->thread );
join_errno = joinfunc(entry->thread, &status); int join_errno = joinfunc(entry->thread, &status);
/* join_errno can legitimately be ESRCH if the thread is /* join_errno can legitimately be ESRCH if the thread is
* already dead, but the client still needs tidying up. */ * already dead, but the client still needs tidying up. */
@@ -598,7 +597,6 @@ int server_accept( struct server * params )
{ {
NULLCHECK( params ); NULLCHECK( params );
debug("accept loop starting"); debug("accept loop starting");
int client_fd;
union mysockaddr client_address; union mysockaddr client_address;
fd_set fds; fd_set fds;
socklen_t socklen=sizeof(client_address); socklen_t socklen=sizeof(client_address);
@@ -638,7 +636,7 @@ int server_accept( struct server * params )
} }
if ( FD_ISSET( params->server_fd, &fds ) ){ if ( FD_ISSET( params->server_fd, &fds ) ){
client_fd = accept( params->server_fd, &client_address.generic, &socklen ); int client_fd = accept( params->server_fd, &client_address.generic, &socklen );
if ( params->allow_new_clients ) { if ( params->allow_new_clients ) {
debug("Accepted nbd client socket fd %d", client_fd); debug("Accepted nbd client socket fd %d", client_fd);
@@ -686,6 +684,7 @@ void* build_allocation_map_thread(void* serve_uncast)
* the future, we'll need to wait for the allocation map to finish or * the future, we'll need to wait for the allocation map to finish or
* fail before we can complete the migration. * fail before we can complete the migration.
*/ */
serve->allocation_map_not_built = 1;
warn( "Didn't build allocation map for %s", serve->filename ); warn( "Didn't build allocation map for %s", serve->filename );
} }
@@ -740,11 +739,11 @@ void server_join_clients( struct server * serve ) {
for (i=0; i < serve->max_nbd_clients; i++) { for (i=0; i < serve->max_nbd_clients; i++) {
pthread_t thread_id = serve->nbd_client[i].thread; pthread_t thread_id = serve->nbd_client[i].thread;
int err = 0;
if (thread_id != 0) { if (thread_id != 0) {
debug( "joining thread %p", thread_id ); debug( "joining thread %p", thread_id );
if ( 0 == (err = pthread_join( thread_id, &status ) ) ) { int err = pthread_join( thread_id, &status );
if ( 0 == err ) {
serve->nbd_client[i].thread = 0; serve->nbd_client[i].thread = 0;
} else { } else {
warn( "Error %s (%i) joining thread %p", strerror( err ), err, thread_id ); warn( "Error %s (%i) joining thread %p", strerror( err ), err, thread_id );
@@ -878,7 +877,19 @@ uint64_t server_mirror_eta( struct server * serve )
{ {
if ( server_is_mirroring( serve ) ) { if ( server_is_mirroring( serve ) ) {
uint64_t bytes_to_xfer = server_mirror_bytes_remaining( serve ); uint64_t bytes_to_xfer = server_mirror_bytes_remaining( serve );
return bytes_to_xfer / ( mirror_current_bps( serve->mirror ) + 1 ); return bytes_to_xfer / ( server_mirror_bps( serve ) + 1 );
}
return 0;
}
uint64_t server_mirror_bps( struct server * serve )
{
if ( server_is_mirroring( serve ) ) {
uint64_t duration_ms =
monotonic_time_ms() - serve->mirror->migration_started;
return serve->mirror->all_dirty / ( ( duration_ms / 1000 ) + 1 );
} }
return 0; return 0;
@@ -941,4 +952,3 @@ int do_serve( struct server* params, struct self_pipe * open_signal )
return success; return success;
} }

View File

@@ -76,8 +76,10 @@ struct server {
struct bitset * allocation_map; struct bitset * allocation_map;
/* when starting up, this thread builds the allocation_map */ /* when starting up, this thread builds the allocation_map */
pthread_t allocation_map_builder_thread; pthread_t allocation_map_builder_thread;
/* when the thread has finished, it sets this to 1 */ /* when the thread has finished, it sets this to 1 */
volatile sig_atomic_t allocation_map_built; volatile sig_atomic_t allocation_map_built;
volatile sig_atomic_t allocation_map_not_built;
int max_nbd_clients; int max_nbd_clients;
struct client_tbl_entry *nbd_client; struct client_tbl_entry *nbd_client;
@@ -126,6 +128,7 @@ int server_is_mirroring( struct server * serve );
uint64_t server_mirror_bytes_remaining( struct server * serve ); uint64_t server_mirror_bytes_remaining( struct server * serve );
uint64_t server_mirror_eta( struct server * serve ); uint64_t server_mirror_eta( struct server * serve );
uint64_t server_mirror_bps( struct server * serve );
void server_abandon_mirror( struct server * serve ); void server_abandon_mirror( struct server * serve );
void server_prevent_mirror_start( struct server *serve ); void server_prevent_mirror_start( struct server *serve );
@@ -151,8 +154,10 @@ int do_serve( struct server *, struct self_pipe * );
struct mode_readwrite_params { struct mode_readwrite_params {
union mysockaddr connect_to; union mysockaddr connect_to;
union mysockaddr connect_from; union mysockaddr connect_from;
off64_t from;
off64_t len; uint64_t from;
uint32_t len;
int data_fd; int data_fd;
int client; int client;
}; };

View File

@@ -27,7 +27,7 @@ struct status * status_create( struct server * serve )
status->migration_duration = 0; status->migration_duration = 0;
} }
status->migration_duration /= 1000; status->migration_duration /= 1000;
status->migration_speed = serve->mirror->all_dirty / ( status->migration_duration + 1 ); status->migration_speed = server_mirror_bps( serve );
status->migration_speed_limit = serve->mirror->max_bytes_per_second; status->migration_speed_limit = serve->mirror->max_bytes_per_second;
status->migration_seconds_left = server_mirror_eta( serve ); status->migration_seconds_left = server_mirror_eta( serve );

View File

@@ -21,6 +21,11 @@ class Environment
@fake_pid = nil @fake_pid = nil
end end
def prefetch_proxy!
@nbd1.prefetch_proxy = true
@nbd2.prefetch_proxy = true
end
def proxy1(port=@port2) def proxy1(port=@port2)
@nbd1.proxy(@ip, port) @nbd1.proxy(@ip, port)
end end

View File

@@ -20,7 +20,13 @@ t = Thread.start do
client2.close client2.close
end end
sleep( FlexNBD::MS_REQUEST_LIMIT_SECS + 2 ) sleep_time = if ENV.has_key?('FLEXNBD_MS_REQUEST_LIMIT_SECS')
ENV['FLEXNBD_MS_REQUEST_LIMIT_SECS'].to_f
else
FlexNBD::MS_REQUEST_LIMIT_SECS
end
sleep( sleep_time + 2.0 )
client1.close client1.close
t.join t.join

View File

@@ -198,6 +198,8 @@ module FlexNBD
end end
end end
attr_accessor :prefetch_proxy
def initialize( bin, ip, port ) def initialize( bin, ip, port )
@bin = bin @bin = bin
@do_debug = ENV['DEBUG'] @do_debug = ENV['DEBUG']
@@ -208,6 +210,7 @@ module FlexNBD
@ip = ip @ip = ip
@port = port @port = port
@kill = [] @kill = []
@prefetch_proxy = false
end end
@@ -247,6 +250,7 @@ module FlexNBD
"--port #{port} "\ "--port #{port} "\
"--conn-addr #{connect_ip} "\ "--conn-addr #{connect_ip} "\
"--conn-port #{connect_port} "\ "--conn-port #{connect_port} "\
"#{prefetch_proxy ? "--cache " : ""}"\
"#{@debug}" "#{@debug}"
end end
@@ -458,12 +462,18 @@ module FlexNBD
def maybe_timeout(cmd, timeout=nil ) def maybe_timeout(cmd, timeout=nil )
stdout, stderr = "","" stdout, stderr = "",""
stat = nil
run = Proc.new do run = Proc.new do
Open3.popen3( cmd ) do |io_in, io_out, io_err| # Ruby 1.9 changed the popen3 api. instead of 3 args, the block
# gets 4. Not only that, but it no longer sets $?, so we have to
# go elsewhere for the process' exit status.
Open3.popen3( cmd ) do |io_in, io_out, io_err, maybe_thr|
io_in.close io_in.close
stdout.replace io_out.read stdout.replace io_out.read
stderr.replace io_err.read stderr.replace io_err.read
stat = maybe_thr.value if maybe_thr
end end
stat ||= $?
end end
if timeout if timeout
@@ -472,13 +482,13 @@ module FlexNBD
run.call run.call
end end
[stdout, stderr] [stdout, stderr, stat]
end end
def mirror(dest_ip, dest_port, bandwidth=nil, action=nil) def mirror(dest_ip, dest_port, bandwidth=nil, action=nil)
stdout, stderr = mirror_unchecked( dest_ip, dest_port, bandwidth, action ) stdout, stderr, status = mirror_unchecked( dest_ip, dest_port, bandwidth, action )
raise IOError.new( "Migrate command failed\n" + stderr) unless $?.success? raise IOError.new( "Migrate command failed\n" + stderr) unless status.success?
stdout stdout
end end

View File

@@ -2,6 +2,14 @@
module FlexNBD module FlexNBD
def self.binary( str )
if str.respond_to? :force_encoding
str.force_encoding "ASCII-8BIT"
else
str
end
end
# eeevil is his one and only name... # eeevil is his one and only name...
def self.read_constants def self.read_constants
parents = [] parents = []
@@ -17,7 +25,7 @@ module FlexNBD
fail "No source root!" unless source_root fail "No source root!" unless source_root
headers = Dir[File.join( source_root, "src", "*.h" ) ] headers = Dir[File.join( source_root, "src", "{common,proxy,server}","*.h" ) ]
headers.each do |header_filename| headers.each do |header_filename|
txt_lines = File.readlines( header_filename ) txt_lines = File.readlines( header_filename )
@@ -33,8 +41,8 @@ module FlexNBD
read_constants() read_constants()
REQUEST_MAGIC = "\x25\x60\x95\x13" unless defined?(REQUEST_MAGIC) REQUEST_MAGIC = binary("\x25\x60\x95\x13") unless defined?(REQUEST_MAGIC)
REPLY_MAGIC = "\x67\x44\x66\x98" unless defined?(REPLY_MAGIC) REPLY_MAGIC = binary("\x67\x44\x66\x98") unless defined?(REPLY_MAGIC)
end # module FlexNBD end # module FlexNBD

View File

@@ -138,7 +138,7 @@ module FlexNBD
end end
def accept( err_msg = "Timed out waiting for a connection", timeout = 2) def accept( err_msg = "Timed out waiting for a connection", timeout = 5)
client_sock = nil client_sock = nil
begin begin

View File

@@ -0,0 +1,190 @@
# encoding: utf-8
require 'flexnbd/fake_source'
require 'flexnbd/fake_dest'
module ProxyTests
def with_proxied_client( override_size = nil )
@env.serve1 unless @server_up
@env.proxy2 unless @proxy_up
@env.nbd2.can_die(0)
client = FlexNBD::FakeSource.new(@env.ip, @env.port2, "Couldn't connect to proxy")
begin
result = client.read_hello
assert_equal "NBDMAGIC", result[:magic]
assert_equal override_size || @env.file1.size, result[:size]
yield client
ensure
client.close rescue nil
end
end
def test_exits_with_error_when_cannot_connect_to_upstream_on_start
assert_raises(RuntimeError) { @env.proxy1 }
end
def test_read_requests_successfully_proxied
with_proxied_client do |client|
(0..3).each do |n|
offset = n * 4096
client.write_read_request(offset, 4096, "myhandle")
rsp = client.read_response
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert_equal 0, rsp[:error]
orig_data = @env.file1.read(offset, 4096)
data = client.read_raw(4096)
assert_equal 4096, orig_data.size
assert_equal 4096, data.size
assert_equal( orig_data, data,
"Returned data does not match on request #{n+1}" )
end
end
end
def test_write_requests_successfully_proxied
with_proxied_client do |client|
(0..3).each do |n|
offset = n * 4096
client.write(offset, "\xFF" * 4096)
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert_equal 0, rsp[:error]
data = @env.file1.read(offset, 4096)
assert_equal( ( "\xFF" * 4096 ), data, "Data not written correctly (offset is #{n})" )
end
end
end
def make_fake_server
server = FlexNBD::FakeDest.new(@env.ip, @env.port1)
@server_up = true
# We return a thread here because accept() and connect() both block for us
Thread.new do
sc = server.accept # just tell the supervisor we're up
sc.write_hello
[ server, sc ]
end
end
def test_read_request_retried_when_upstream_dies_partway
maker = make_fake_server
with_proxied_client(4096) do |client|
server, sc1 = maker.value
# Send the read request to the proxy
client.write_read_request( 0, 4096 )
# ensure we're given the read request
req1 = sc1.read_request
assert_equal ::FlexNBD::REQUEST_MAGIC, req1[:magic]
assert_equal ::FlexNBD::REQUEST_READ, req1[:type]
assert_equal 0, req1[:from]
assert_not_equal 0, req1[:len]
# Kill the server again, now we're sure the read request has been sent once
sc1.close
# We expect the proxy to reconnect without our client doing anything.
sc2 = server.accept
sc2.write_hello
# And once reconnected, it should resend an identical request.
req2 = sc2.read_request
assert_equal req1, req2
# The reply should be proxied back to the client.
sc2.write_reply( req2[:handle] )
sc2.write_data( "\xFF" * 4096 )
# Check it to make sure it's correct
rsp = timeout(15) { client.read_response }
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
assert_equal req1[:handle], rsp[:handle]
data = client.read_raw( 4096 )
assert_equal( ("\xFF" * 4096), data, "Wrong data returned" )
sc2.close
server.close
end
end
def test_write_request_retried_when_upstream_dies_partway
maker = make_fake_server
with_proxied_client(4096) do |client|
server, sc1 = maker.value
# Send the read request to the proxy
client.write( 0, ( "\xFF" * 4096 ) )
# ensure we're given the read request
req1 = sc1.read_request
assert_equal ::FlexNBD::REQUEST_MAGIC, req1[:magic]
assert_equal ::FlexNBD::REQUEST_WRITE, req1[:type]
assert_equal 0, req1[:from]
assert_equal 4096, req1[:len]
data1 = sc1.read_data( 4096 )
assert_equal( ( "\xFF" * 4096 ), data1, "Data not proxied successfully" )
# Kill the server again, now we're sure the read request has been sent once
sc1.close
# We expect the proxy to reconnect without our client doing anything.
sc2 = server.accept
sc2.write_hello
# And once reconnected, it should resend an identical request.
req2 = sc2.read_request
assert_equal req1, req2
data2 = sc2.read_data( 4096 )
assert_equal data1, data2
# The reply should be proxied back to the client.
sc2.write_reply( req2[:handle] )
# Check it to make sure it's correct
rsp = timeout(15) { client.read_response }
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
assert_equal req1[:handle], rsp[:handle]
sc2.close
server.close
end
end
def test_only_one_client_can_connect_to_proxy_at_a_time
with_proxied_client do |client|
c2 = nil
assert_raises(Timeout::Error) do
timeout(1) do
c2 = FlexNBD::FakeSource.new(@env.ip, @env.port2, "Couldn't connect to proxy (2)")
c2.read_hello
end
end
c2.close rescue nil if c2
end
end
end

View File

@@ -2,12 +2,17 @@
require 'test/unit' require 'test/unit'
require 'environment' require 'environment'
require 'flexnbd/constants'
class TestHappyPath < Test::Unit::TestCase class TestHappyPath < Test::Unit::TestCase
def setup def setup
@env = Environment.new @env = Environment.new
end end
def bin(str)
FlexNBD.binary str
end
def teardown def teardown
@env.nbd1.can_die(0) @env.nbd1.can_die(0)
@env.nbd2.can_die(0) @env.nbd2.can_die(0)
@@ -22,13 +27,13 @@ class TestHappyPath < Test::Unit::TestCase
[0, 12, 63].each do |num| [0, 12, 63].each do |num|
assert_equal( assert_equal(
@env.nbd1.read(num*@env.blocksize, @env.blocksize), bin( @env.nbd1.read(num*@env.blocksize, @env.blocksize) ),
@env.file1.read(num*@env.blocksize, @env.blocksize) bin( @env.file1.read(num*@env.blocksize, @env.blocksize) )
) )
end end
[124, 1200, 10028, 25488].each do |num| [124, 1200, 10028, 25488].each do |num|
assert_equal(@env.nbd1.read(num, 4), @env.file1.read(num, 4)) assert_equal(bin(@env.nbd1.read(num, 4)), bin(@env.file1.read(num, 4)))
end end
end end
@@ -102,7 +107,7 @@ class TestHappyPath < Test::Unit::TestCase
assert_no_match( /unrecognized/, stderr ) assert_no_match( /unrecognized/, stderr )
Timeout.timeout(2) do @env.nbd1.join end Timeout.timeout(10) do @env.nbd1.join end
assert !File.file?( @env.filename1 ) assert !File.file?( @env.filename1 )
end end

View File

@@ -0,0 +1,22 @@
require 'test/unit'
require 'environment'
require 'proxy_tests'
class TestPrefetchProxyMode < Test::Unit::TestCase
include ProxyTests
def setup
super
@env = Environment.new
@env.prefetch_proxy!
@env.writefile1( "f" * 16 )
end
def teardown
@env.cleanup
super
end
end

View File

@@ -1,200 +1,20 @@
require 'test/unit' require 'test/unit'
require 'environment' require 'environment'
require 'flexnbd/fake_source' require 'proxy_tests'
require 'flexnbd/fake_dest'
class TestProxyMode < Test::Unit::TestCase class TestProxyMode < Test::Unit::TestCase
include ProxyTests
def setup def setup
super super
@env = Environment.new @env = Environment.new
@env.writefile1( "0" * 16 ) @env.writefile1( "f" * 16 )
end end
def teardown def teardown
@env.cleanup @env.cleanup
super super
end end
def with_proxied_client( override_size = nil )
@env.serve1 unless @server_up
@env.proxy2 unless @proxy_up
@env.nbd2.can_die(0)
client = FlexNBD::FakeSource.new(@env.ip, @env.port2, "Couldn't connect to proxy")
begin
result = client.read_hello
assert_equal "NBDMAGIC", result[:magic]
assert_equal override_size || @env.file1.size, result[:size]
yield client
ensure
client.close rescue nil
end
end
def test_exits_with_error_when_cannot_connect_to_upstream_on_start
assert_raises(RuntimeError) { @env.proxy1 }
end
def test_read_requests_successfully_proxied
with_proxied_client do |client|
(0..3).each do |n|
offset = n * 4096
client.write_read_request(offset, 4096, "myhandle")
rsp = client.read_response
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert_equal 0, rsp[:error]
orig_data = @env.file1.read(offset, 4096)
data = client.read_raw(4096)
assert_equal 4096, orig_data.size
assert_equal 4096, data.size
assert_equal( orig_data, data, "Returned data does not match" )
end
end
end
def test_write_requests_successfully_proxied
with_proxied_client do |client|
(0..3).each do |n|
offset = n * 4096
client.write(offset, "\xFF" * 4096)
rsp = client.read_response
assert_equal FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal "myhandle", rsp[:handle]
assert_equal 0, rsp[:error]
data = @env.file1.read(offset, 4096)
assert_equal( ( "\xFF" * 4096 ), data, "Data not written correctly (offset is #{n})" )
end
end
end
def make_fake_server
server = FlexNBD::FakeDest.new(@env.ip, @env.port1)
@server_up = true
# We return a thread here because accept() and connect() both block for us
Thread.new do
sc = server.accept # just tell the supervisor we're up
sc.write_hello
[ server, sc ]
end
end
def test_read_request_retried_when_upstream_dies_partway
maker = make_fake_server
with_proxied_client(4096) do |client|
server, sc1 = maker.value
# Send the read request to the proxy
client.write_read_request( 0, 4096 )
# ensure we're given the read request
req1 = sc1.read_request
assert_equal ::FlexNBD::REQUEST_MAGIC, req1[:magic]
assert_equal ::FlexNBD::REQUEST_READ, req1[:type]
assert_equal 0, req1[:from]
assert_not_equal 0, req1[:len]
# Kill the server again, now we're sure the read request has been sent once
sc1.close
# We expect the proxy to reconnect without our client doing anything.
sc2 = server.accept
sc2.write_hello
# And once reconnected, it should resend an identical request.
req2 = sc2.read_request
assert_equal req1, req2
# The reply should be proxied back to the client.
sc2.write_reply( req2[:handle] )
sc2.write_data( "\xFF" * 4096 )
# Check it to make sure it's correct
rsp = timeout(15) { client.read_response }
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
assert_equal req1[:handle], rsp[:handle]
data = client.read_raw( 4096 )
assert_equal( ("\xFF" * 4096), data, "Wrong data returned" )
sc2.close
server.close
end
end
def test_write_request_retried_when_upstream_dies_partway
maker = make_fake_server
with_proxied_client(4096) do |client|
server, sc1 = maker.value
# Send the read request to the proxy
client.write( 0, ( "\xFF" * 4096 ) )
# ensure we're given the read request
req1 = sc1.read_request
assert_equal ::FlexNBD::REQUEST_MAGIC, req1[:magic]
assert_equal ::FlexNBD::REQUEST_WRITE, req1[:type]
assert_equal 0, req1[:from]
assert_equal 4096, req1[:len]
data1 = sc1.read_data( 4096 )
assert_equal( ( "\xFF" * 4096 ), data1, "Data not proxied successfully" )
# Kill the server again, now we're sure the read request has been sent once
sc1.close
# We expect the proxy to reconnect without our client doing anything.
sc2 = server.accept
sc2.write_hello
# And once reconnected, it should resend an identical request.
req2 = sc2.read_request
assert_equal req1, req2
data2 = sc2.read_data( 4096 )
assert_equal data1, data2
# The reply should be proxied back to the client.
sc2.write_reply( req2[:handle] )
# Check it to make sure it's correct
rsp = timeout(15) { client.read_response }
assert_equal ::FlexNBD::REPLY_MAGIC, rsp[:magic]
assert_equal 0, rsp[:error]
assert_equal req1[:handle], rsp[:handle]
sc2.close
server.close
end
end
def test_only_one_client_can_connect_to_proxy_at_a_time
with_proxied_client do |client|
c2 = nil
assert_raises(Timeout::Error) do
timeout(1) do
c2 = FlexNBD::FakeSource.new(@env.ip, @env.port2, "Couldn't connect to proxy (2)")
c2.read_hello
end
end
c2.close rescue nil if c2
end
end
end end

View File

@@ -7,6 +7,9 @@ require 'environment'
class TestSourceErrorHandling < Test::Unit::TestCase class TestSourceErrorHandling < Test::Unit::TestCase
def setup def setup
@old_env = ENV['FLEXNBD_MS_REQUEST_LIMIT_SECS']
ENV['FLEXNBD_MS_REQUEST_LIMIT_SECS'] = "4.0"
@env = Environment.new @env = Environment.new
@env.writefile1( "f" * 4 ) @env.writefile1( "f" * 4 )
@env.serve1 @env.serve1
@@ -16,6 +19,7 @@ class TestSourceErrorHandling < Test::Unit::TestCase
def teardown def teardown
@env.nbd1.can_die(0) @env.nbd1.can_die(0)
@env.cleanup @env.cleanup
ENV['FLEXNBD_MS_REQUEST_LIMIT_SECS'] = @old_env
end end

View File

@@ -10,7 +10,7 @@
START_TEST(test_bit_set) START_TEST(test_bit_set)
{ {
uint64_t num = 0; uint64_t num = 0;
char *bits = (char*) &num; bitfield_p bits = (bitfield_p) &num;
#define TEST_BIT_SET(bit, newvalue) \ #define TEST_BIT_SET(bit, newvalue) \
bit_set(bits, (bit)); \ bit_set(bits, (bit)); \
@@ -27,7 +27,7 @@ END_TEST
START_TEST(test_bit_clear) START_TEST(test_bit_clear)
{ {
uint64_t num = 0xffffffffffffffff; uint64_t num = 0xffffffffffffffff;
char *bits = (char*) &num; bitfield_p bits = (bitfield_p) &num;
#define TEST_BIT_CLEAR(bit, newvalue) \ #define TEST_BIT_CLEAR(bit, newvalue) \
bit_clear(bits, (bit)); \ bit_clear(bits, (bit)); \
@@ -44,7 +44,7 @@ END_TEST
START_TEST(test_bit_tests) START_TEST(test_bit_tests)
{ {
uint64_t num = 0x5555555555555555; uint64_t num = 0x5555555555555555;
char *bits = (char*) &num; bitfield_p bits = (bitfield_p) &num;
fail_unless(bit_has_value(bits, 0, 1), "bit_has_value malfunction"); fail_unless(bit_has_value(bits, 0, 1), "bit_has_value malfunction");
fail_unless(bit_has_value(bits, 1, 0), "bit_has_value malfunction"); fail_unless(bit_has_value(bits, 1, 0), "bit_has_value malfunction");
@@ -58,7 +58,7 @@ END_TEST
START_TEST(test_bit_ranges) START_TEST(test_bit_ranges)
{ {
char buffer[4160]; bitfield_word_t buffer[BIT_WORDS_FOR_SIZE(4160)];
uint64_t *longs = (unsigned long*) buffer; uint64_t *longs = (unsigned long*) buffer;
uint64_t i; uint64_t i;
@@ -84,7 +84,7 @@ END_TEST
START_TEST(test_bit_runs) START_TEST(test_bit_runs)
{ {
char buffer[256]; bitfield_word_t buffer[BIT_WORDS_FOR_SIZE(256)];
int i, ptr=0, runs[] = { int i, ptr=0, runs[] = {
56,97,22,12,83,1,45,80,85,51,64,40,63,67,75,64,94,81,79,62 56,97,22,12,83,1,45,80,85,51,64,40,63,67,75,64,94,81,79,62
}; };

View File

@@ -76,8 +76,8 @@ START_TEST( test_read_request_quits_on_stop_signal )
client_signal_stop( c ); client_signal_stop( c );
int client_read_request( struct client *, struct nbd_request *); int client_serve_request( struct client *);
fail_unless( 0 == client_read_request( c, &nbdr ), "Didn't quit on stop." ); fail_unless( 1 == client_serve_request( c ), "Didn't quit on stop." );
close( fds[0] ); close( fds[0] );
close( fds[1] ); close( fds[1] );

View File

@@ -57,7 +57,7 @@ void * responder( void *respond_uncast )
fd_write_reply( sock_fd, wrong_handle, 0 ); fd_write_reply( sock_fd, wrong_handle, 0 );
} }
else { else {
fd_write_reply( sock_fd, resp->received.handle, 0 ); fd_write_reply( sock_fd, (char*)resp->received.handle.b, 0 );
} }
write( sock_fd, "12345678", 8 ); write( sock_fd, "12345678", 8 );
} }

View File

@@ -93,7 +93,7 @@ END_TEST
int connect_client( char *addr, int actual_port, char *source_addr ) int connect_client( char *addr, int actual_port, char *source_addr )
{ {
int client_fd; int client_fd = -1;
struct addrinfo hint; struct addrinfo hint;
struct addrinfo *ailist, *aip; struct addrinfo *ailist, *aip;

View File

@@ -72,9 +72,11 @@ START_TEST( test_sockaddr_address_string_doesnt_overflow_short_buffer )
char testbuf[128]; char testbuf[128];
const char* result; const char* result;
memset( testbuf, 0, 128 );
v4->sin_family = AF_INET; v4->sin_family = AF_INET;
v4->sin_port = htons( 4777 ); v4->sin_port = htons( 4777 );
ck_assert_int_eq( 1, inet_pton( AF_INET, "192.168.0.1", &v4->sin_addr )); ck_assert_int_eq( 1, inet_pton( AF_INET, "192.168.0.1", &v4->sin_addr ));
memset( &testbuf, 0, 128 );
result = sockaddr_address_string( &sa, &testbuf[0], 2 ); result = sockaddr_address_string( &sa, &testbuf[0], 2 );
ck_assert( result == NULL ); ck_assert( result == NULL );

View File

@@ -71,11 +71,12 @@ START_TEST( test_fatal_kills_process )
sleep(10); sleep(10);
} }
else { else {
int kidstatus; int kidret, kidstatus, result;
int result; result = waitpid( pid, &kidret, 0 );
result = waitpid( pid, &kidstatus, 0 );
fail_if( result < 0, "Wait failed." ); fail_if( result < 0, "Wait failed." );
fail_unless( kidstatus == 6, "Kid was not aborted." ); fail_unless( WIFSIGNALED( kidret ), "Process didn't exit via signal" );
kidstatus = WTERMSIG( kidret );
ck_assert_int_eq( kidstatus, SIGABRT );
} }
} }