qemu.git
29 hours agoMerge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-5.0-sf1' into... master github/master
Peter Maydell [Fri, 24 Jan 2020 12:34:04 +0000 (12:34 +0000)] 
Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-5.0-sf1' into staging

RISC-V Patches for the 5.0 Soft Freeze, Part 1

This patch set contains a handful of collected fixes that I'd like to target
for the 5.0 soft freeze (I know that's a long way away, I just don't know what
else to call these):

* A fix for a memory leak initializing the sifive_u board.
* Fixes to privilege mode emulation related to interrupts and fstatus.

Notably absent is the H extension implementation.  That's pretty much reviewed,
but not quite ready to go yet and I didn't want to hold back these important
fixes.  This boots 32-bit and 64-bit Linux (buildroot this time, just for fun)
and passes "make check".

# gpg: Signature made Tue 21 Jan 2020 22:55:28 GMT
# gpg:                using RSA key 2B3C3747446843B24A943A7A2E1319F35FBB1889
# gpg:                issuer "palmer@dabbelt.com"
# gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [unknown]
# gpg:                 aka "Palmer Dabbelt <palmer@sifive.com>" [unknown]
# gpg:                 aka "Palmer Dabbelt <palmerdabbelt@google.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88  6DF8 EF4C A150 2CCB AB41
#      Subkey fingerprint: 2B3C 3747 4468 43B2 4A94  3A7A 2E13 19F3 5FBB 1889

* remotes/palmer/tags/riscv-for-master-5.0-sf1:
  target/riscv: update mstatus.SD when FS is set dirty
  target/riscv: fsd/fsw doesn't dirty FP state
  target/riscv: Fix tb->flags FS status
  riscv: Set xPIE to 1 after xRET
  riscv/sifive_u: fix a memory leak in soc_realize()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
32 hours agoMerge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200123b...
Peter Maydell [Fri, 24 Jan 2020 09:59:11 +0000 (09:59 +0000)] 
Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200123b' into staging

virtiofsd first pull v2

Import our virtiofsd.
This pulls in the daemon to drive a file system connected to the
existing qemu virtiofsd device.
It's derived from upstream libfuse with lots of changes (and a lot
trimmed out).
The daemon lives in the newly created qemu/tools/virtiofsd

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
v2
  drop the docs while we discuss where they should live
  and we need to redo the manpage in anything but texi

# gpg: Signature made Thu 23 Jan 2020 16:45:18 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-virtiofs-20200123b: (108 commits)
  virtiofsd: add some options to the help message
  virtiofsd: stop all queue threads on exit in virtio_loop()
  virtiofsd/passthrough_ll: Pass errno to fuse_reply_err()
  virtiofsd: Convert lo_destroy to take the lo->mutex lock itself
  virtiofsd: add --thread-pool-size=NUM option
  virtiofsd: fix lo_destroy() resource leaks
  virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  virtiofsd: process requests in a thread pool
  virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  virtiofsd: add definition of fuse_buf_writev()
  virtiofsd: passthrough_ll: Use cache_readdir for directory open
  virtiofsd: Fix data corruption with O_APPEND write in writeback mode
  virtiofsd: Reset O_DIRECT flag during file open
  virtiofsd: convert more fprintf and perror to use fuse log infra
  virtiofsd: do not always set FUSE_FLOCK_LOCKS
  virtiofsd: introduce inode refcount to prevent use-after-free
  virtiofsd: passthrough_ll: fix refcounting on remove/rename
  libvhost-user: Fix some memtable remap cases
  virtiofsd: rename inode->refcount to inode->nlookup
  virtiofsd: prevent races with lo_dirp_put()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
47 hours agoMerge remote-tracking branch 'remotes/kraxel/tags/ui-20200123-pull-request' into...
Peter Maydell [Thu, 23 Jan 2020 18:44:39 +0000 (18:44 +0000)] 
Merge remote-tracking branch 'remotes/kraxel/tags/ui-20200123-pull-request' into staging

vnc: fix zlib compression artifacts.
ui: add "none" to -display help.

# gpg: Signature made Thu 23 Jan 2020 14:20:53 GMT
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/ui-20200123-pull-request:
  ui/console: Display the 'none' backend in '-display help'
  vnc: prioritize ZRLE compression over ZLIB
  Revert "vnc: allow fall back to RAW encoding"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2 days agovirtiofsd: add some options to the help message
Masayoshi Mizuma [Wed, 18 Dec 2019 20:08:31 +0000 (15:08 -0500)] 
virtiofsd: add some options to the help message

Add following options to the help message:
- cache
- flock|no_flock
- norace
- posix_lock|no_posix_lock
- readdirplus|no_readdirplus
- timeout
- writeback|no_writeback
- xattr|no_xattr

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
dgilbert: Split cache, norace, posix_lock, readdirplus off
  into our own earlier patches that added the options

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: stop all queue threads on exit in virtio_loop()
Eryu Guan [Tue, 7 Jan 2020 04:15:21 +0000 (12:15 +0800)] 
virtiofsd: stop all queue threads on exit in virtio_loop()

On guest graceful shutdown, virtiofsd receives VHOST_USER_GET_VRING_BASE
request from VMM and shuts down virtqueues by calling fv_set_started(),
which joins fv_queue_thread() threads. So when virtio_loop() returns,
there should be no thread is still accessing data in fuse session and/or
virtio dev.

But on abnormal exit, e.g. guest got killed for whatever reason,
vhost-user socket is closed and virtio_loop() breaks out the main loop
and returns to main(). But it's possible fv_queue_worker()s are still
working and accessing fuse session and virtio dev, which results in
crash or use-after-free.

Fix it by stopping fv_queue_thread()s before virtio_loop() returns,
to make sure there's no-one could access fuse session and virtio dev.

Reported-by: Qingming Su <qingming.su@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd/passthrough_ll: Pass errno to fuse_reply_err()
Xiao Yang [Thu, 2 Jan 2020 03:53:12 +0000 (11:53 +0800)] 
virtiofsd/passthrough_ll: Pass errno to fuse_reply_err()

lo_copy_file_range() passes -errno to fuse_reply_err() and then fuse_reply_err()
changes it to errno again, so that subsequent fuse_send_reply_iov_nofree() catches
the wrong errno.(i.e. reports "fuse: bad error value: ...").

Make fuse_send_reply_iov_nofree() accept the correct -errno by passing errno
directly in lo_copy_file_range().

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Eryu Guan <eguan@linux.alibaba.com>
dgilbert: Sent upstream and now Merged as aa1185e153f774f1df65
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Convert lo_destroy to take the lo->mutex lock itself
Dr. David Alan Gilbert [Fri, 23 Aug 2019 14:39:24 +0000 (15:39 +0100)] 
virtiofsd: Convert lo_destroy to take the lo->mutex lock itself

lo_destroy was relying on some implicit knowledge of the locking;
we can avoid this if we create an unref_inode that doesn't take
the lock and then grab it for the whole of the lo_destroy.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add --thread-pool-size=NUM option
Stefan Hajnoczi [Thu, 1 Aug 2019 16:54:09 +0000 (17:54 +0100)] 
virtiofsd: add --thread-pool-size=NUM option

Add an option to control the size of the thread pool.  Requests are now
processed in parallel by default.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: fix lo_destroy() resource leaks
Stefan Hajnoczi [Thu, 1 Aug 2019 16:54:08 +0000 (17:54 +0100)] 
virtiofsd: fix lo_destroy() resource leaks

Now that lo_destroy() is serialized we can call unref_inode() so that
all inode resources are freed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
Stefan Hajnoczi [Thu, 1 Aug 2019 16:54:07 +0000 (17:54 +0100)] 
virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races

When running with multiple threads it can be tricky to handle
FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
malicious clients cannot trigger race conditions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: process requests in a thread pool
Stefan Hajnoczi [Thu, 1 Aug 2019 16:54:06 +0000 (17:54 +0100)] 
virtiofsd: process requests in a thread pool

Introduce a thread pool so that fv_queue_thread() just pops
VuVirtqElements and hands them to the thread pool.  For the time being
only one worker thread is allowed since passthrough_ll.c is not
thread-safe yet.  Future patches will lift this restriction so that
multiple FUSE requests can be processed in parallel.

The main new concept is struct FVRequest, which contains both
VuVirtqElement and struct fuse_chan.  We now have fv_VuDev for a device,
fv_QueueInfo for a virtqueue, and FVRequest for a request.  Some of
fv_QueueInfo's fields are moved into FVRequest because they are
per-request.  The name FVRequest conforms to QEMU coding style and I
expect the struct fv_* types will be renamed in a future refactoring.

This patch series is not optimal.  fbuf reuse is dropped so each request
does malloc(se->bufsize), but there is no clean and cheap way to keep
this with a thread pool.  The vq_lock mutex is held for longer than
necessary, especially during the eventfd_write() syscall.  Performance
can be improved in the future.

prctl(2) had to be added to the seccomp whitelist because glib invokes
it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
piaojun [Fri, 16 Aug 2019 03:42:21 +0000 (11:42 +0800)] 
virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance

fuse_buf_writev() only handles the normal write in which src is buffer
and dest is fd. Specially if src buffer represents guest physical
address that can't be mapped by the daemon process, IO must be bounced
back to the VMM to do it by fuse_buf_copy().

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add definition of fuse_buf_writev()
piaojun [Fri, 16 Aug 2019 03:41:16 +0000 (11:41 +0800)] 
virtiofsd: add definition of fuse_buf_writev()

Define fuse_buf_writev() which use pwritev and writev to improve io
bandwidth. Especially, the src bufs with 0 size should be skipped as
their mems are not *block_size* aligned which will cause writev failed
in direct io mode.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: Use cache_readdir for directory open
Misono Tomohiro [Mon, 20 Jan 2020 02:53:30 +0000 (11:53 +0900)] 
virtiofsd: passthrough_ll: Use cache_readdir for directory open

Since keep_cache(FOPEN_KEEP_CACHE) has no effect for directory as
described in fuse_common.h, use cache_readdir(FOPNE_CACHE_DIR) for
diretory open when cache=always mode.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Fix data corruption with O_APPEND write in writeback mode
Misono Tomohiro [Wed, 23 Oct 2019 12:25:23 +0000 (21:25 +0900)] 
virtiofsd: Fix data corruption with O_APPEND write in writeback mode

When writeback mode is enabled (-o writeback), O_APPEND handling is
done in kernel. Therefore virtiofsd clears O_APPEND flag when open.
Otherwise O_APPEND flag takes precedence over pwrite() and write
data may corrupt.

Currently clearing O_APPEND flag is done in lo_open(), but we also
need the same operation in lo_create(). So, factor out the flag
update operation in lo_open() to update_open_flags() and call it
in both lo_open() and lo_create().

This fixes the failure of xfstest generic/069 in writeback mode
(which tests O_APPEND write data integrity).

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Reset O_DIRECT flag during file open
Vivek Goyal [Tue, 20 Aug 2019 18:37:46 +0000 (14:37 -0400)] 
virtiofsd: Reset O_DIRECT flag during file open

If an application wants to do direct IO and opens a file with O_DIRECT
in guest, that does not necessarily mean that we need to bypass page
cache on host as well. So reset this flag on host.

If somebody needs to bypass page cache on host as well (and it is safe to
do so), we can add a knob in daemon later to control this behavior.

I check virtio-9p and they do reset O_DIRECT flag.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: convert more fprintf and perror to use fuse log infra
Eryu Guan [Fri, 9 Aug 2019 08:25:36 +0000 (16:25 +0800)] 
virtiofsd: convert more fprintf and perror to use fuse log infra

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: do not always set FUSE_FLOCK_LOCKS
Peng Tao [Fri, 2 Aug 2019 11:12:23 +0000 (19:12 +0800)] 
virtiofsd: do not always set FUSE_FLOCK_LOCKS

Right now we always enable it regardless of given commandlines.
Fix it by setting the flag relying on the lo->flock bit.

Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: introduce inode refcount to prevent use-after-free
Stefan Hajnoczi [Wed, 31 Jul 2019 16:10:06 +0000 (17:10 +0100)] 
virtiofsd: introduce inode refcount to prevent use-after-free

If thread A is using an inode it must not be deleted by thread B when
processing a FUSE_FORGET request.

The FUSE protocol itself already has a counter called nlookup that is
used in FUSE_FORGET messages.  We cannot trust this counter since the
untrusted client can manipulate it via FUSE_FORGET messages.

Introduce a new refcount to keep inodes alive for the required lifespan.
lo_inode_put() must be called to release a reference.  FUSE's nlookup
counter holds exactly one reference so that the inode stays alive as
long as the client still wants to remember it.

Note that the lo_inode->is_symlink field is moved to avoid creating a
hole in the struct due to struct field alignment.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: fix refcounting on remove/rename
Miklos Szeredi [Wed, 12 Sep 2018 10:25:42 +0000 (12:25 +0200)] 
virtiofsd: passthrough_ll: fix refcounting on remove/rename

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agolibvhost-user: Fix some memtable remap cases
Dr. David Alan Gilbert [Mon, 12 Aug 2019 16:35:19 +0000 (17:35 +0100)] 
libvhost-user: Fix some memtable remap cases

If a new setmemtable command comes in once the vhost threads are
running, it will remap the guests address space and the threads
will now be looking in the wrong place.

Fortunately we're running this command under lock, so we can
update the queue mappings so that threads will look in the new-right
place.

Note: This doesn't fix things that the threads might be doing
without a lock (e.g. a readv/writev!)  That's for another time.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: rename inode->refcount to inode->nlookup
Stefan Hajnoczi [Wed, 31 Jul 2019 16:10:04 +0000 (17:10 +0100)] 
virtiofsd: rename inode->refcount to inode->nlookup

This reference counter plays a specific role in the FUSE protocol.  It's
not a generic object reference counter and the FUSE kernel code calls it
"nlookup".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: prevent races with lo_dirp_put()
Stefan Hajnoczi [Fri, 26 Jul 2019 09:11:03 +0000 (10:11 +0100)] 
virtiofsd: prevent races with lo_dirp_put()

Introduce lo_dirp_put() so that FUSE_RELEASEDIR does not cause
use-after-free races with other threads that are accessing lo_dirp.

Also make lo_releasedir() atomic to prevent FUSE_RELEASEDIR racing with
itself.  This prevents double-frees.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: make lo_release() atomic
Stefan Hajnoczi [Fri, 26 Jul 2019 09:11:01 +0000 (10:11 +0100)] 
virtiofsd: make lo_release() atomic

Hold the lock across both lo_map_get() and lo_map_remove() to prevent
races between two FUSE_RELEASE requests.  In this case I don't see a
serious bug but it's safer to do things atomically.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: prevent fv_queue_thread() vs virtio_loop() races
Stefan Hajnoczi [Wed, 17 Jul 2019 15:05:57 +0000 (16:05 +0100)] 
virtiofsd: prevent fv_queue_thread() vs virtio_loop() races

We call into libvhost-user from the virtqueue handler thread and the
vhost-user message processing thread without a lock.  There is nothing
protecting the virtqueue handler thread if the vhost-user message
processing thread changes the virtqueue or memory table while it is
running.

This patch introduces a read-write lock.  Virtqueue handler threads are
readers.  The vhost-user message processing thread is a writer.  This
will allow concurrency for multiqueue in the future while protecting
against fv_queue_thread() vs virtio_loop() races.

Note that the critical sections could be made smaller but it would be
more invasive and require libvhost-user changes.  Let's start simple and
improve performance later, if necessary.  Another option would be an
RCU-style approach with lighter-weight primitives.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()
Stefan Hajnoczi [Wed, 26 Jun 2019 16:51:32 +0000 (17:51 +0100)] 
virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()

vu_socket_path is NULL when --fd=FDNUM was used.  Use
fuse_lowlevel_is_virtio() instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Support remote posix locks
Vivek Goyal [Mon, 10 Jun 2019 19:22:06 +0000 (15:22 -0400)] 
virtiofsd: Support remote posix locks

Doing posix locks with-in guest kernel are not sufficient if a file/dir
is being shared by multiple guests. So we need the notion of daemon doing
the locks which are visible to rest of the guests.

Given posix locks are per process, one can not call posix lock API on host,
otherwise bunch of basic posix locks properties are broken. For example,
If two processes (A and B) in guest open the file and take locks on different
sections of file, if one of the processes closes the fd, it will close
fd on virtiofsd and all posix locks on file will go away. This means if
process A closes the fd, then locks of process B will go away too.

Similar other problems exist too.

This patch set tries to emulate posix locks while using open file
description locks provided on Linux.

Daemon provides two options (-o posix_lock, -o no_posix_lock) to enable
or disable posix locking in daemon. By default it is enabled.

There are few issues though.

- GETLK() returns pid of process holding lock. As we are emulating locks
  using OFD, and these locks are not per process and don't return pid
  of process, so GETLK() in guest does not reuturn process pid.

- As of now only F_SETLK is supported and not F_SETLKW. We can't block
  the thread in virtiofsd for arbitrary long duration as there is only
  one thread serving the queue. That means unlock request will not make
  it to daemon and F_SETLKW will block infinitely and bring virtio-fs
  to a halt. This is a solvable problem though and will require significant
  changes in virtiofsd and kernel. Left as a TODO item for now.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agoVirtiofsd: fix memory leak on fuse queueinfo
Liu Bo [Mon, 24 Jun 2019 21:53:47 +0000 (05:53 +0800)] 
Virtiofsd: fix memory leak on fuse queueinfo

For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
fv_queue_set_started() but not cleaned up when the daemon process quits.

This fixes the leak in proper places.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: fix incorrect error handling in lo_do_lookup
Eric Ren [Tue, 11 Jun 2019 13:44:40 +0000 (21:44 +0800)] 
virtiofsd: fix incorrect error handling in lo_do_lookup

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: enable PARALLEL_DIROPS during INIT
Liu Bo [Fri, 7 Jun 2019 02:38:18 +0000 (10:38 +0800)] 
virtiofsd: enable PARALLEL_DIROPS during INIT

lookup is a RO operations, PARALLEL_DIROPS can be enabled.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Prevent multiply running with same vhost_user_socket
Masayoshi Mizuma [Tue, 13 Aug 2019 20:06:45 +0000 (16:06 -0400)] 
virtiofsd: Prevent multiply running with same vhost_user_socket

virtiofsd can run multiply even if the vhost_user_socket is same path.

  ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
  [1] 244965
  virtio_session_mount: Waiting for vhost-user socket connection...
  ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
  [2] 244966
  virtio_session_mount: Waiting for vhost-user socket connection...
  ]#

The user will get confused about the situation and maybe the cause of the
unexpected problem. So it's better to prevent the multiple running.

Create a regular file under localstatedir directory to exclude the
vhost_user_socket. To create and lock the file, use qemu_write_pidfile()
because the API has some sanity checks and file lock.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Applied fixes from Stefan's review and moved osdep include
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add helper for lo_data cleanup
Liu Bo [Thu, 6 Jun 2019 21:43:56 +0000 (05:43 +0800)] 
virtiofsd: add helper for lo_data cleanup

This offers an helper function for lo_data's cleanup.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: fix memory leak on lo.source
Liu Bo [Thu, 6 Jun 2019 21:43:53 +0000 (05:43 +0800)] 
virtiofsd: fix memory leak on lo.source

valgrind reported that lo.source is leaked on quiting, but it was defined
as (const char*) as it may point to a const string "/".

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: cleanup allocated resource in se
Liu Bo [Thu, 6 Jun 2019 21:43:52 +0000 (05:43 +0800)] 
virtiofsd: cleanup allocated resource in se

This cleans up unfreed resources in se on quiting, including
se->virtio_dev, se->vu_socket_path, se->vu_socketfd.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: fix error handling in main()
Liu Bo [Wed, 5 Jun 2019 00:42:35 +0000 (08:42 +0800)] 
virtiofsd: fix error handling in main()

Neither fuse_parse_cmdline() nor fuse_opt_parse() goes to the right place
to do cleanup.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: support nanosecond resolution for file timestamp
Jiufei Xue [Tue, 16 Apr 2019 19:08:56 +0000 (03:08 +0800)] 
virtiofsd: support nanosecond resolution for file timestamp

Define HAVE_STRUCT_STAT_ST_ATIM to 1 if `st_atim' is member of `struct
stat' which means support nanosecond resolution for the file timestamp
fields.

Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Clean up inodes on destroy
Dr. David Alan Gilbert [Fri, 22 Feb 2019 18:33:52 +0000 (18:33 +0000)] 
virtiofsd: Clean up inodes on destroy

Clear out our inodes and fd's on a 'destroy' - so we get rid
of them if we reboot the guest.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: use hashtable
Miklos Szeredi [Thu, 15 Nov 2018 14:29:51 +0000 (15:29 +0100)] 
virtiofsd: passthrough_ll: use hashtable

Improve performance of inode lookup by using a hash table.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: clean up cache related options
Miklos Szeredi [Thu, 15 Nov 2018 14:29:51 +0000 (15:29 +0100)] 
virtiofsd: passthrough_ll: clean up cache related options

 - Rename "cache=never" to "cache=none" to match 9p's similar option.

 - Rename CACHE_NORMAL constant to CACHE_AUTO to match the "cache=auto"
   option.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: extract root inode init into setup_root()
Miklos Szeredi [Wed, 20 Nov 2019 14:25:50 +0000 (14:25 +0000)] 
virtiofsd: extract root inode init into setup_root()

Inititialize the root inode in a single place.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
dgilbert:
with fix suggested by Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: fail when parent inode isn't known in lo_do_lookup()
Miklos Szeredi [Wed, 20 Nov 2019 14:14:29 +0000 (14:14 +0000)] 
virtiofsd: fail when parent inode isn't known in lo_do_lookup()

The Linux file handle APIs (struct export_operations) can access inodes
that are not attached to parents because path name traversal is not
performed.  Refuse if there is no parent in lo_do_lookup().

Also clean up lo_do_lookup() while we're here.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: rename unref_inode() to unref_inode_lolocked()
Miklos Szeredi [Wed, 20 Nov 2019 14:11:09 +0000 (14:11 +0000)] 
virtiofsd: rename unref_inode() to unref_inode_lolocked()

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: control readdirplus
Miklos Szeredi [Thu, 16 Aug 2018 09:14:13 +0000 (11:14 +0200)] 
virtiofsd: passthrough_ll: control readdirplus

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: disable readdirplus on cache=never
Miklos Szeredi [Thu, 16 Aug 2018 09:14:13 +0000 (11:14 +0200)] 
virtiofsd: passthrough_ll: disable readdirplus on cache=never

...because the attributes sent in the READDIRPLUS reply would be discarded
anyway.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: add renameat2 support
Miklos Szeredi [Wed, 15 Aug 2018 15:05:29 +0000 (17:05 +0200)] 
virtiofsd: passthrough_ll: add renameat2 support

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agocontrib/libvhost-user: Protect slave fd with mutex
Dr. David Alan Gilbert [Fri, 1 Mar 2019 11:18:30 +0000 (11:18 +0000)] 
contrib/libvhost-user: Protect slave fd with mutex

In future patches we'll be performing commands on the slave-fd driven
by commands on queues, since those queues will be driven by individual
threads we need to make sure they don't attempt to use the slave-fd
for multiple commands in parallel.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovhost-user: Print unexpected slave message types
Dr. David Alan Gilbert [Thu, 7 Feb 2019 18:22:40 +0000 (18:22 +0000)] 
vhost-user: Print unexpected slave message types

When we receive an unexpected message type on the slave fd, print
the type.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Kill threads when queues are stopped
Dr. David Alan Gilbert [Fri, 23 Nov 2018 18:19:31 +0000 (18:19 +0000)] 
virtiofsd: Kill threads when queues are stopped

Kill the threads we've started when the queues get stopped.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With improvements by:
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Handle hard reboot
Dr. David Alan Gilbert [Thu, 22 Nov 2018 16:05:09 +0000 (16:05 +0000)] 
virtiofsd: Handle hard reboot

Handle a
  mount
  hard reboot (without unmount)
  mount

we get another 'init' which FUSE doesn't normally expect.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Handle reinit
Dr. David Alan Gilbert [Wed, 21 Nov 2018 18:02:07 +0000 (18:02 +0000)] 
virtiofsd: Handle reinit

Allow init->destroy->init  for mount->umount->mount

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Add timestamp to the log with FUSE_LOG_DEBUG level
Masayoshi Mizuma [Wed, 6 Nov 2019 19:06:02 +0000 (14:06 -0500)] 
virtiofsd: Add timestamp to the log with FUSE_LOG_DEBUG level

virtiofsd has some threads, so we see a lot of logs with debug option.
It would be useful for debugging if we can see the timestamp.

Add nano second timestamp, which got by get_clock(), to the log with
FUSE_LOG_DEBUG level if the syslog option isn't set.

The log is like as:

  # ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
  ...
  [5365943125463727] [ID: 00000002] fv_queue_thread: Start for queue 0 kick_fd 9
  [5365943125568644] [ID: 00000002] fv_queue_thread: Waiting for Queue 0 event
  [5365943125573561] [ID: 00000002] fv_queue_thread: Got queue event on Queue 0

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Add ID to the log with FUSE_LOG_DEBUG level
Masayoshi Mizuma [Wed, 6 Nov 2019 19:06:01 +0000 (14:06 -0500)] 
virtiofsd: Add ID to the log with FUSE_LOG_DEBUG level

virtiofsd has some threads, so we see a lot of logs with debug option.
It would be useful for debugging if we can identify the specific thread
from the log.

Add ID, which is got by gettid(), to the log with FUSE_LOG_DEBUG level
so that we can grep the specific thread.

The log is like as:

  ]# ./virtiofsd -d -o vhost_user_socket=/tmp/vhostqemu0 -o source=/tmp/share0 -o cache=auto
  ...
  [ID: 00000097]    unique: 12696, success, outsize: 120
  [ID: 00000097] virtio_send_msg: elem 18: with 2 in desc of length 120
  [ID: 00000003] fv_queue_thread: Got queue event on Queue 1
  [ID: 00000003] fv_queue_thread: Queue 1 gave evalue: 1 available: in: 65552 out: 80
  [ID: 00000003] fv_queue_thread: Waiting for Queue 1 event
  [ID: 00000071] fv_queue_worker: elem 33: with 2 out desc of length 80 bad_in_num=0 bad_out_num=0
  [ID: 00000071] unique: 12694, opcode: READ (15), nodeid: 2, insize: 80, pid: 2014
  [ID: 00000071] lo_read(ino=2, size=65536, off=131072)

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  added rework as suggested by Daniel P. Berrangé during review
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: print log only when priority is high enough
Eryu Guan [Fri, 9 Aug 2019 08:25:35 +0000 (16:25 +0800)] 
virtiofsd: print log only when priority is high enough

Introduce "-o log_level=" command line option to specify current log
level (priority), valid values are "debug info warn err", e.g.

    ./virtiofsd -o log_level=debug ...

So only log priority higher than "debug" will be printed to
stderr/syslog. And the default level is info.

The "-o debug"/"-d" options are kept, and imply debug log level.

Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
dgilbert: Reworked for libfuse's log_func
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
with fix by:
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add --syslog command-line option
Stefan Hajnoczi [Wed, 26 Jun 2019 09:25:54 +0000 (10:25 +0100)] 
virtiofsd: add --syslog command-line option

Sometimes collecting output from stderr is inconvenient or does not fit
within the overall logging architecture.  Add syslog(3) support for
cases where stderr cannot be used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
dgilbert: Reworked as a logging function
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: fix libfuse information leaks
Stefan Hajnoczi [Fri, 22 Nov 2019 11:31:30 +0000 (11:31 +0000)] 
virtiofsd: fix libfuse information leaks

Some FUSE message replies contain padding fields that are not
initialized by libfuse.  This is fine in traditional FUSE applications
because the kernel is trusted.  virtiofsd does not trust the guest and
must not expose uninitialized memory.

Use C struct initializers to automatically zero out memory.  Not all of
these code changes are strictly necessary but they will prevent future
information leaks if the structs are extended.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: set maximum RLIMIT_NOFILE limit
Stefan Hajnoczi [Fri, 22 Mar 2019 15:54:13 +0000 (15:54 +0000)] 
virtiofsd: set maximum RLIMIT_NOFILE limit

virtiofsd can exceed the default open file descriptor limit easily on
most systems.  Take advantage of the fact that it runs as root to raise
the limit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Drop CAP_FSETID if client asked for it
Vivek Goyal [Tue, 13 Aug 2019 19:29:44 +0000 (15:29 -0400)] 
virtiofsd: Drop CAP_FSETID if client asked for it

If client requested killing setuid/setgid bits on file being written, drop
CAP_FSETID capability so that setuid/setgid bits are cleared upon write
automatically.

pjdfstest chown/12.t needs this.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
  dgilbert: reworked for libcap-ng
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: cap-ng helpers
Dr. David Alan Gilbert [Tue, 3 Dec 2019 12:23:44 +0000 (12:23 +0000)] 
virtiofsd: cap-ng helpers

libcap-ng reads /proc during capng_get_caps_process, and virtiofsd's
sandboxing doesn't have /proc mounted; thus we have to do the
caps read before we sandbox it and save/restore the state.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Parse flag FUSE_WRITE_KILL_PRIV
Vivek Goyal [Tue, 13 Aug 2019 19:29:42 +0000 (15:29 -0400)] 
virtiofsd: Parse flag FUSE_WRITE_KILL_PRIV

Caller can set FUSE_WRITE_KILL_PRIV in write_flags. Parse it and pass it
to the filesystem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add seccomp whitelist
Stefan Hajnoczi [Wed, 13 Mar 2019 09:32:51 +0000 (09:32 +0000)] 
virtiofsd: add seccomp whitelist

Only allow system calls that are needed by virtiofsd.  All other system
calls cause SIGSYS to be directed at the thread and the process will
coredump.

Restricting system calls reduces the kernel attack surface and limits
what the process can do when compromised.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
with additional entries by:
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: piaojun <piaojun@huawei.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: move to a new pid namespace
Stefan Hajnoczi [Wed, 16 Oct 2019 16:01:57 +0000 (17:01 +0100)] 
virtiofsd: move to a new pid namespace

virtiofsd needs access to /proc/self/fd.  Let's move to a new pid
namespace so that a compromised process cannot see another other
processes running on the system.

One wrinkle in this approach: unshare(CLONE_NEWPID) affects *child*
processes and not the current process.  Therefore we need to fork the
pid 1 process that will actually run virtiofsd and leave a parent in
waitpid(2).  This is not the same thing as daemonization and parent
processes should not notice a difference.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: move to an empty network namespace
Stefan Hajnoczi [Wed, 16 Oct 2019 16:01:56 +0000 (17:01 +0100)] 
virtiofsd: move to an empty network namespace

If the process is compromised there should be no network access.  Use an
empty network namespace to sandbox networking.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: sandbox mount namespace
Stefan Hajnoczi [Tue, 12 Mar 2019 15:51:38 +0000 (15:51 +0000)] 
virtiofsd: sandbox mount namespace

Use a mount namespace with the shared directory tree mounted at "/" and
no other mounts.

This prevents symlink escape attacks because symlink targets are
resolved only against the shared directory and cannot go outside it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: use /proc/self/fd/ O_PATH file descriptor
Stefan Hajnoczi [Tue, 12 Mar 2019 15:48:50 +0000 (15:48 +0000)] 
virtiofsd: use /proc/self/fd/ O_PATH file descriptor

Sandboxing will remove /proc from the mount namespace so we can no
longer build string paths into "/proc/self/fd/...".

Keep an O_PATH file descriptor so we can still re-open fds via
/proc/self/fd.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: prevent ".." escape in lo_do_readdir()
Stefan Hajnoczi [Tue, 5 Mar 2019 09:32:49 +0000 (09:32 +0000)] 
virtiofsd: prevent ".." escape in lo_do_readdir()

Construct a fake dirent for the root directory's ".." entry.  This hides
the parent directory from the FUSE client.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: prevent ".." escape in lo_do_lookup()
Stefan Hajnoczi [Mon, 4 Mar 2019 10:38:46 +0000 (10:38 +0000)] 
virtiofsd: prevent ".." escape in lo_do_lookup()

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: check input buffer size in fuse_lowlevel.c ops
Stefan Hajnoczi [Thu, 28 Feb 2019 16:38:31 +0000 (16:38 +0000)] 
virtiofsd: check input buffer size in fuse_lowlevel.c ops

Each FUSE operation involves parsing the input buffer.  Currently the
code assumes the input buffer is large enough for the expected
arguments.  This patch uses fuse_mbuf_iter to check the size.

Most operations are simple to convert.  Some are more complicated due to
variable-length inputs or different sizes depending on the protocol
version.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: validate input buffer sizes in do_write_buf()
Stefan Hajnoczi [Thu, 28 Feb 2019 11:25:32 +0000 (11:25 +0000)] 
virtiofsd: validate input buffer sizes in do_write_buf()

There is a small change in behavior: if fuse_write_in->size doesn't
match the input buffer size then the request is failed.  Previously
write requests with 1 fuse_buf element would truncate to
fuse_write_in->size.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add fuse_mbuf_iter API
Stefan Hajnoczi [Thu, 28 Feb 2019 10:30:20 +0000 (10:30 +0000)] 
virtiofsd: add fuse_mbuf_iter API

Introduce an API for consuming bytes from a buffer with size checks.
All FUSE operations will be converted to use this safe API instead of
void *inarg.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Pass write iov's all the way through
Dr. David Alan Gilbert [Fri, 4 Jan 2019 16:47:39 +0000 (16:47 +0000)] 
virtiofsd: Pass write iov's all the way through

Pass the write iov pointing to guest RAM all the way through rather
than copying the data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Plumb fuse_bufvec through to do_write_buf
Dr. David Alan Gilbert [Fri, 4 Jan 2019 18:23:00 +0000 (18:23 +0000)] 
virtiofsd: Plumb fuse_bufvec through to do_write_buf

Let fuse_session_process_buf_int take a fuse_bufvec * instead of a
fuse_buf;  and then through to do_write_buf - where in the best
case it can pass that straight through to op.write_buf without copying
(other than skipping a header).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: validate path components
Stefan Hajnoczi [Tue, 26 Feb 2019 17:58:59 +0000 (17:58 +0000)] 
virtiofsd: validate path components

Several FUSE requests contain single path components.  A correct FUSE
client sends well-formed path components but there is currently no input
validation in case something went wrong or the client is malicious.

Refuse ".", "..", and paths containing '/' when we expect a path
component.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: add fallback for racy ops
Miklos Szeredi [Wed, 14 Nov 2018 15:52:03 +0000 (16:52 +0100)] 
virtiofsd: passthrough_ll: add fallback for racy ops

We have two operations that cannot be done race-free on a symlink in
certain cases: utimes and link.

Add racy fallback for these if the race-free method doesn't work.  We do
our best to avoid races even in this case:

  - get absolute path by reading /proc/self/fd/NN symlink

  - lookup parent directory: after this we are safe against renames in
    ancestors

  - lookup name in parent directory, and verify that we got to the original
    inode,  if not retry the whole thing

Both utimes(2) and link(2) hold i_lock on the inode across the operation,
so a racing rename/delete by this fuse instance is not possible, only from
other entities changing the filesystem.

If the "norace" option is given, then disable the racy fallbacks.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: add fd_map to hide file descriptors
Stefan Hajnoczi [Thu, 31 Jan 2019 06:02:40 +0000 (14:02 +0800)] 
virtiofsd: passthrough_ll: add fd_map to hide file descriptors

Do not expose file descriptor numbers to clients.  This prevents the
abuse of internal file descriptors (like stdin/stdout).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Fix from:
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
dgilbert:
  Added lseek
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers
Stefan Hajnoczi [Thu, 31 Jan 2019 06:01:28 +0000 (14:01 +0800)] 
virtiofsd: passthrough_ll: add dirp_map to hide lo_dirp pointers

Do not expose lo_dirp pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers
Stefan Hajnoczi [Thu, 31 Jan 2019 04:57:34 +0000 (12:57 +0800)] 
virtiofsd: passthrough_ll: add ino_map to hide lo_inode pointers

Do not expose lo_inode pointers to clients.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: add lo_map for ino/fh indirection
Stefan Hajnoczi [Thu, 31 Jan 2019 04:49:39 +0000 (12:49 +0800)] 
virtiofsd: passthrough_ll: add lo_map for ino/fh indirection

A layer of indirection is needed because passthrough_ll cannot expose
pointers or file descriptor numbers to untrusted clients.  Malicious
clients could send invalid pointers or file descriptors in order to
crash or exploit the file system daemon.

lo_map provides an integer key->value mapping.  This will be used for
ino and fh fields in the patches that follow.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: passthrough_ll: create new files in caller's context
Vivek Goyal [Wed, 15 Aug 2018 15:05:29 +0000 (17:05 +0200)] 
virtiofsd: passthrough_ll: create new files in caller's context

We need to create files in the caller's context. Otherwise after
creating a file, the caller might not be able to do file operations on
that file.

Changed effective uid/gid to caller's uid/gid, create file and then
switch back to uid/gid 0.

Use syscall(setresuid, ...) otherwise glibc does some magic to change EUID
in all threads, which is not what we want.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofs: Add maintainers entry
Dr. David Alan Gilbert [Mon, 21 Oct 2019 10:41:36 +0000 (11:41 +0100)] 
virtiofs: Add maintainers entry

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add --print-capabilities option
Stefan Hajnoczi [Tue, 27 Aug 2019 09:54:35 +0000 (10:54 +0100)] 
virtiofsd: add --print-capabilities option

Add the --print-capabilities option as per vhost-user.rst "Backend
programs conventions".  Currently there are no advertised features.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add vhost-user.json file
Stefan Hajnoczi [Tue, 27 Aug 2019 09:54:37 +0000 (10:54 +0100)] 
virtiofsd: add vhost-user.json file

Install a vhost-user.json file describing virtiofsd.  This allows
libvirt and other management tools to enumerate vhost-user backend
programs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: make -f (foreground) the default
Stefan Hajnoczi [Tue, 27 Aug 2019 09:54:34 +0000 (10:54 +0100)] 
virtiofsd: make -f (foreground) the default

According to vhost-user.rst "Backend program conventions", backend
programs should run in the foregound by default.  Follow the
conventions so libvirt and other management tools can control virtiofsd
in a standard way.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add --fd=FDNUM fd passing option
Stefan Hajnoczi [Tue, 25 Jun 2019 16:18:00 +0000 (17:18 +0100)] 
virtiofsd: add --fd=FDNUM fd passing option

Although --socket-path=PATH is useful for manual invocations, management
tools typically create the UNIX domain socket themselves and pass it to
the vhost-user device backend.  This way QEMU can be launched
immediately with a valid socket.  No waiting for the vhost-user device
backend is required when fd passing is used.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Fast path for virtio read
Dr. David Alan Gilbert [Wed, 15 Aug 2018 19:26:05 +0000 (20:26 +0100)] 
virtiofsd: Fast path for virtio read

Readv the data straight into the guests buffer.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
With fix by:
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Add Makefile wiring for virtiofsd contrib
Dr. David Alan Gilbert [Thu, 7 Feb 2019 12:17:21 +0000 (12:17 +0000)] 
virtiofsd: Add Makefile wiring for virtiofsd contrib

Wire up the building of the virtiofsd in tools.

virtiofsd relies on Linux-specific system calls and seccomp.  Anyone
wishing to port it to other host operating systems should do so
carefully and without reducing security.

Only allow building on Linux hosts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Keep track of replies
Dr. David Alan Gilbert [Thu, 21 Jun 2018 09:38:03 +0000 (10:38 +0100)] 
virtiofsd: Keep track of replies

Keep track of whether we sent a reply to a request; this is a bit
paranoid but it means:
  a) We should always recycle an element even if there was an error
     in the request
  b) Never try and send two replies on one queue element

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Send replies to messages
Dr. David Alan Gilbert [Mon, 18 Jun 2018 17:46:01 +0000 (18:46 +0100)] 
virtiofsd: Send replies to messages

Route fuse out messages back through the same queue elements
that had the command that triggered the request.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Start reading commands from queue
Dr. David Alan Gilbert [Thu, 14 Jun 2018 18:52:23 +0000 (19:52 +0100)] 
virtiofsd: Start reading commands from queue

Pop queue elements off queues, copy the data from them and
pass that to fuse.

  Note: 'out' in a VuVirtqElement is from QEMU
        'in' in libfuse is into the daemon

  So we read from the out iov's to get a fuse_in_header

When we get a kick we've got to read all the elements until the queue
is empty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Poll kick_fd for queue
Dr. David Alan Gilbert [Thu, 14 Jun 2018 11:07:07 +0000 (12:07 +0100)] 
virtiofsd: Poll kick_fd for queue

In the queue thread poll the kick_fd we're passed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Start queue threads
Dr. David Alan Gilbert [Wed, 13 Jun 2018 19:17:51 +0000 (20:17 +0100)] 
virtiofsd: Start queue threads

Start a thread for each queue when we get notified it's been started.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
fix by:
Signed-off-by: Jun Piao <piaojun@huawei.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: get/set features callbacks
Dr. David Alan Gilbert [Wed, 13 Jun 2018 19:17:17 +0000 (20:17 +0100)] 
virtiofsd: get/set features callbacks

Add the get/set features callbacks.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Add main virtio loop
Dr. David Alan Gilbert [Tue, 12 Jun 2018 15:31:24 +0000 (16:31 +0100)] 
virtiofsd: Add main virtio loop

Processes incoming requests on the vhost-user fd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Start wiring up vhost-user
Dr. David Alan Gilbert [Fri, 8 Jun 2018 18:59:20 +0000 (19:59 +0100)] 
virtiofsd: Start wiring up vhost-user

Listen on our unix socket for the connection from QEMU, when we get it
initialise vhost-user and dive into our own loop variant (currently
dummy).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Open vhost connection instead of mounting
Dr. David Alan Gilbert [Thu, 7 Jun 2018 19:11:14 +0000 (20:11 +0100)] 
virtiofsd: Open vhost connection instead of mounting

When run with vhost-user options we conect to the QEMU instead
via a socket.  Start this off by creating the socket.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: add -o source=PATH to help output
Stefan Hajnoczi [Fri, 8 Mar 2019 12:23:55 +0000 (12:23 +0000)] 
virtiofsd: add -o source=PATH to help output

The -o source=PATH option will be used by most command-line invocations.
Let's document it!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Add options for virtio
Dr. David Alan Gilbert [Thu, 7 Jun 2018 16:22:33 +0000 (17:22 +0100)] 
virtiofsd: Add options for virtio

Add options to specify parameters for virtio-fs paths, i.e.

   ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Make fsync work even if only inode is passed in
Vivek Goyal [Thu, 30 Aug 2018 18:22:10 +0000 (14:22 -0400)] 
virtiofsd: Make fsync work even if only inode is passed in

If caller has not sent file handle in request, then using inode, retrieve
the fd opened using O_PATH and use that to open file again and issue
fsync. This will be needed when dax_flush() calls fsync. At that time
we only have inode information (and not file).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovitriofsd/passthrough_ll: fix fallocate() ifdefs
Xiao Yang [Mon, 13 Jan 2020 09:37:34 +0000 (17:37 +0800)] 
vitriofsd/passthrough_ll: fix fallocate() ifdefs

1) Use correct CONFIG_FALLOCATE macro to check if fallocate() is supported.(i.e configure
   script sets CONFIG_FALLOCATE intead of HAVE_FALLOCATE if fallocate() is supported)
2) Replace HAVE_POSIX_FALLOCATE with CONFIG_POSIX_FALLOCATE.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Merged from two of Xiao Yang's patches

2 days agovirtiofsd: Trim out compatibility code
Dr. David Alan Gilbert [Wed, 27 Nov 2019 17:31:24 +0000 (17:31 +0000)] 
virtiofsd: Trim out compatibility code

virtiofsd only supports major=7, minor>=31; trim out a lot of
old compatibility code.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2 days agovirtiofsd: Fix common header and define for QEMU builds
Dr. David Alan Gilbert [Fri, 8 Feb 2019 11:48:42 +0000 (11:48 +0000)] 
virtiofsd: Fix common header and define for QEMU builds

All of the fuse files include config.h and define GNU_SOURCE
where we don't have either under our build - remove them.
Fixup path to the kernel's fuse.h in the QEMUs world.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>