qemu.git
14 months agoUpdate version for 4.1.1 release stable-4.1 github/stable-4.1 v4.1.1
Michael Roth [Thu, 14 Nov 2019 18:04:03 +0000 (12:04 -0600)] 
Update version for 4.1.1 release

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agomirror: Keep mirror_top_bs drained after dropping permissions
Kevin Wolf [Mon, 22 Jul 2019 15:44:27 +0000 (17:44 +0200)] 
mirror: Keep mirror_top_bs drained after dropping permissions

mirror_top_bs is currently implicitly drained through its connection to
the source or the target node. However, the drain section for target_bs
ends early after moving mirror_top_bs from src to target_bs, so that
requests can already be restarted while mirror_top_bs is still present
in the chain, but has dropped all permissions and therefore runs into an
assertion failure like this:

    qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare:
    Assertion `child->perm & BLK_PERM_WRITE' failed.

Keep mirror_top_bs drained until all graph changes have completed.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit d2da5e288a2e71e82866c8fdefd41b5727300124)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/create: Do not abort if a block driver is not available
Philippe Mathieu-Daudé [Wed, 11 Sep 2019 22:08:49 +0000 (00:08 +0200)] 
block/create: Do not abort if a block driver is not available

The 'blockdev-create' QMP command was introduced as experimental
feature in commit b0292b851b8, using the assert() debug call.
It got promoted to 'stable' command in 3fb588a0f2c, but the
assert call was not removed.

Some block drivers are optional, and bdrv_find_format() might
return a NULL value, triggering the assertion.

Stable code is not expected to abort, so return an error instead.

This is easily reproducible when libnfs is not installed:

  ./configure
  [...]
  module support    no
  Block whitelist (rw)
  Block whitelist (ro)
  libiscsi support  yes
  libnfs support    no
  [...]

Start QEMU:

  $ qemu-system-x86_64 -S -qmp unix:/tmp/qemu.qmp,server,nowait

Send the 'blockdev-create' with the 'nfs' driver:

  $ ( cat << 'EOF'
  {'execute': 'qmp_capabilities'}
  {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
  EOF
  ) | socat STDIO UNIX:/tmp/qemu.qmp
  {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 4}, "package": "v4.1.0-733-g89ea03a7dc"}, "capabilities": ["oob"]}}
  {"return": {}}

QEMU crashes:

  $ gdb qemu-system-x86_64 core
  Program received signal SIGSEGV, Segmentation fault.
  (gdb) bt
  #0  0x00007ffff510957f in raise () at /lib64/libc.so.6
  #1  0x00007ffff50f3895 in abort () at /lib64/libc.so.6
  #2  0x00007ffff50f3769 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
  #3  0x00007ffff5101a26 in .annobin_assert.c_end () at /lib64/libc.so.6
  #4  0x0000555555d7e1f1 in qmp_blockdev_create (job_id=0x555556baee40 "x", options=0x555557666610, errp=0x7fffffffc770) at block/create.c:69
  #5  0x0000555555c96b52 in qmp_marshal_blockdev_create (args=0x7fffdc003830, ret=0x7fffffffc7f8, errp=0x7fffffffc7f0) at qapi/qapi-commands-block-core.c:1314
  #6  0x0000555555deb0a0 in do_qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false, errp=0x7fffffffc898) at qapi/qmp-dispatch.c:131
  #7  0x0000555555deb2a1 in qmp_dispatch (cmds=0x55555645de70 <qmp_commands>, request=0x7fffdc005c70, allow_oob=false) at qapi/qmp-dispatch.c:174

With this patch applied, QEMU returns a QMP error:

  {'execute': 'blockdev-create', 'arguments': {'job-id': 'x', 'options': {'size': 0, 'driver': 'nfs', 'location': {'path': '/', 'server': {'host': '::1', 'type': 'inet'}}}}, 'id': 'x'}
  {"id": "x", "error": {"class": "GenericError", "desc": "Block driver 'nfs' not found or not supported"}}

Cc: qemu-stable@nongnu.org
Reported-by: Xu Tian <xutian@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d90d5cae2b10efc0e8d0b3cc91ff16201853d3ba)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agovhost: Fix memory region section comparison
Dr. David Alan Gilbert [Wed, 14 Aug 2019 17:55:35 +0000 (18:55 +0100)] 
vhost: Fix memory region section comparison

Using memcmp to compare structures wasn't safe,
as I found out on ARM when I was getting falce miscompares.

Use the helper function for comparing the MRSs.

Fixes: ade6d081fc33948e56e6 ("vhost: Regenerate region list from changed sections list")
Cc: qemu-stable@nongnu.org
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190814175535.2023-4-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3fc4a64cbaed2ddee4c60ddc06740b320e18ab82)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agomemory: Provide an equality function for MemoryRegionSections
Dr. David Alan Gilbert [Wed, 14 Aug 2019 17:55:34 +0000 (18:55 +0100)] 
memory: Provide an equality function for MemoryRegionSections

Provide a comparison function that checks all the fields are the same.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190814175535.2023-3-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9366cf02e4e31c2a8128904d4d8290a0fad5f888)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agomemory: Align MemoryRegionSections fields
Dr. David Alan Gilbert [Wed, 14 Aug 2019 17:55:33 +0000 (18:55 +0100)] 
memory: Align MemoryRegionSections fields

MemoryRegionSection includes an Int128 'size' field;
on some platforms the compiler causes an alignment of this to
a 128bit boundary, leaving 8 bytes of dead space.
This deadspace can be filled with junk.

Move the size field to the top avoiding unnecessary alignment.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190814175535.2023-2-dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 44f85d3276397cfa2cfa379c61430405dad4e644)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agotests: make filemonitor test more robust to event ordering
Daniel P. Berrangé [Wed, 21 Aug 2019 15:14:27 +0000 (16:14 +0100)] 
tests: make filemonitor test more robust to event ordering

The ordering of events that are emitted during the rmdir
test have changed with kernel >= 5.3. Semantically both
new & old orderings are correct, so we must be able to
cope with either.

To cope with this, when we see an unexpected event, we
push it back onto the queue and look and the subsequent
event to see if that matches instead.

Tested-by: Peter Xu <peterx@redhat.com>
Tested-by: Wei Yang <richardw.yang@linux.intel.com>
Tested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
(cherry picked from commit bf9e0313c27d8e6ecd7f7de3d63e1cb25d8f6311)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock: posix: Always allocate the first block
Nir Soffer [Tue, 27 Aug 2019 01:05:27 +0000 (04:05 +0300)] 
block: posix: Always allocate the first block

When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

Since we allocate the first block even with preallocation=off, we no
longer create images with zero disk size:

    $ ./qemu-img create -f raw test.raw 1g
    Formatting 'test.raw', fmt=raw size=1073741824

    $ ls -lhs test.raw
    4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

And converting the image requires additional cluster:

    $ ./qemu-img measure -f raw -O qcow2 test.raw
    required size: 458752
    fully allocated size: 1074135040

When using format like vmdk with multiple files per image, we allocate
one block per file:

    $ ./qemu-img create -f vmdk -o subformat=twoGbMaxExtentFlat test.vmdk 4g
    Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined subformat=twoGbMaxExtentFlat

    $ ls -lhs test*.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f001.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer 2.0G Aug 27 03:23 test-f002.vmdk
    4.0K -rw-r--r--. 1 nsoffer nsoffer  353 Aug 27 03:23 test.vmdk

I did quick performance test for copying disks with qemu-img convert to
new raw target image to Gluster storage with sector size of 512 bytes:

    for i in $(seq 10); do
        rm -f dst.raw
        sleep 10
        time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
    done

Here is a table comparing the total time spent:

Type    Before(s)   After(s)    Diff(%)
---------------------------------------
real      530.028    469.123      -11.4
user       17.204     10.768      -37.4
sys        17.881      7.011      -60.7

We can see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-id: 20190827010528.8818-2-nsoffer@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 3a20013fbb26d2a1bd11ef148eefdb1508783787)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agofile-posix: Handle undetectable alignment
Nir Soffer [Tue, 13 Aug 2019 18:21:03 +0000 (21:21 +0300)] 
file-posix: Handle undetectable alignment

In some cases buf_align or request_alignment cannot be detected:

1. With Gluster, buf_align cannot be detected since the actual I/O is
   done on Gluster server, and qemu buffer alignment does not matter.
   Since we don't have alignment requirement, buf_align=1 is the best
   value.

2. With local XFS filesystem, buf_align cannot be detected if reading
   from unallocated area. In this we must align the buffer, but we don't
   know what is the correct size. Using the wrong alignment results in
   I/O error.

3. With Gluster backed by XFS, request_alignment cannot be detected if
   reading from unallocated area. In this case we need to use the
   correct alignment, and failing to do so results in I/O errors.

4. With NFS, the server does not use direct I/O, so both buf_align cannot
   be detected. In this case we don't need any alignment so we can use
   buf_align=1 and request_alignment=1.

These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.

To determine if we can detect the alignment, we probe first with
align=1. If probing succeeds, maybe there are no alignment requirement
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
don't have any way to tell, we treat this as undetectable alignment. If
probing with align=1 fails with EINVAL, but probing with one of the
expected alignments succeeds, we know that we found a working alignment.

Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value. To use this logic,
we probe first request alignment instead of buf_align.

Here is a table showing the behaviour with current code (the value in
parenthesis is the optimal value).

Case    Sector    buf_align (opt)   request_alignment (opt)     result
======================================================================
1       512       512   (1)          512   (512)                 OK
1       4096      512   (1)          4096  (4096)                FAIL
----------------------------------------------------------------------
2       512       512   (512)        512   (512)                 OK
2       4096      512   (4096)       4096  (4096)                FAIL
----------------------------------------------------------------------
3       512       512   (1)          512   (512)                 OK
3       4096      512   (1)          512   (4096)                FAIL
----------------------------------------------------------------------
4       512       512   (1)          512   (1)                   OK
4       4096      512   (1)          512   (1)                   OK

Same cases with this change:

Case    Sector    buf_align (opt)   request_alignment (opt)     result
======================================================================
1       512       512   (1)          512   (512)                 OK
1       4096      4096  (1)          4096  (4096)                OK
----------------------------------------------------------------------
2       512       512   (512)        512   (512)                 OK
2       4096      4096  (4096)       4096  (4096)                OK
----------------------------------------------------------------------
3       512       4096  (1)          4096  (512)                 OK
3       4096      4096  (1)          4096  (4096)                OK
----------------------------------------------------------------------
4       512       4096  (1)          4096  (1)                   OK
4       4096      4096  (1)          4096  (1)                   OK

I tested that provisioning VMs and copying disks on local XFS and
Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
I tested also on XFS, NFS, Gluster with 512 bytes sector size.

[1] https://bugzilla.redhat.com/1737256
[2] https://bugzilla.redhat.com/1738657

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a6b257a08e3d72219f03e461a52152672fec0612)

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/file-posix: Let post-EOF fallocate serialize
Max Reitz [Fri, 1 Nov 2019 15:25:10 +0000 (16:25 +0100)] 
block/file-posix: Let post-EOF fallocate serialize

The XFS kernel driver has a bug that may cause data corruption for qcow2
images as of qemu commit c8bb23cbdbe32f.  We can work around it by
treating post-EOF fallocates as serializing up until infinity (INT64_MAX
in practice).

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-4-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 292d06b925b2787ee6f2430996b95651cae42fce)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock: Add bdrv_co_get_self_request()
Max Reitz [Fri, 1 Nov 2019 15:25:09 +0000 (16:25 +0100)] 
block: Add bdrv_co_get_self_request()

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-3-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit c28107e9e55b11cd35cf3dc2505e3e69d10dcf13)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock: Make wait/mark serialising requests public
Max Reitz [Fri, 1 Nov 2019 15:25:08 +0000 (16:25 +0100)] 
block: Make wait/mark serialising requests public

Make both bdrv_mark_request_serialising() and
bdrv_wait_serialising_requests() public so they can be used from block
drivers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191101152510.11719-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 304d9d7f034ff7f5e1e66a65b7f720f63a72c57e)
 Conflicts:
block/io.c
*drop context dependency on 1acc3466a2
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/io: refactor padding
Vladimir Sementsov-Ogievskiy [Tue, 4 Jun 2019 16:15:05 +0000 (19:15 +0300)] 
block/io: refactor padding

We have similar padding code in bdrv_co_pwritev,
bdrv_co_do_pwrite_zeroes and bdrv_co_preadv. Let's combine and unify
it.

[Squashed in Vladimir's qemu-iotests 077 fix
--Stefan]

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-4-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-4-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 7a3f542fbdfd799be4fa6f8b96dc8c1e6933fce4)
*prereq for 292d06b9
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoutil/iov: improve qemu_iovec_is_zero
Vladimir Sementsov-Ogievskiy [Tue, 4 Jun 2019 16:15:04 +0000 (19:15 +0300)] 
util/iov: improve qemu_iovec_is_zero

We'll need to check a part of qiov soon, so implement it now.

Optimization with align down to 4 * sizeof(long) is dropped due to:
1. It is strange: it aligns length of the buffer, but where is a
   guarantee that buffer pointer is aligned itself?
2. buffer_is_zero() is a better place for optimizations and it has
   them.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-3-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-3-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f76889e7b947d896db51be8a4d9c941c2f70365a)
*prereq for 292d06b9
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoutil/iov: introduce qemu_iovec_init_extended
Vladimir Sementsov-Ogievskiy [Tue, 4 Jun 2019 16:15:03 +0000 (19:15 +0300)] 
util/iov: introduce qemu_iovec_init_extended

Introduce new initialization API, to create requests with padding. Will
be used in the following patch. New API uses qemu_iovec_init_buf if
resulting io vector has only one element, to avoid extra allocations.
So, we need to update qemu_iovec_destroy to support destroying such
QIOVs.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20190604161514.262241-2-vsementsov@virtuozzo.com
Message-Id: <20190604161514.262241-2-vsementsov@virtuozzo.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d953169d4840f312d3b9a54952f4a7ccfcb3b311)
*prereq for 292d06b9
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoqcow2-bitmap: Fix uint64_t left-shift overflow
Tuguoyi [Fri, 1 Nov 2019 07:37:35 +0000 (07:37 +0000)] 
qcow2-bitmap: Fix uint64_t left-shift overflow

There are two issues in In check_constraints_on_bitmap(),
1) The sanity check on the granularity will cause uint64_t
integer left-shift overflow when cluster_size is 2M and the
granularity is BIGGER than 32K.
2) The way to calculate image size that the maximum bitmap
supported can map to is a bit incorrect.
This patch fix it by add a helper function to calculate the
number of bytes needed by a normal bitmap in image and compare
it to the maximum bitmap bytes supported by qemu.

Fixes: 5f72826e7fc62167cf3a
Signed-off-by: Guoyi Tu <tu.guoyi@h3c.com>
Message-id: 4ba40cd1e7ee4a708b40899952e49f22@h3c.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 570542ecb11e04b61ef4b3f4d0965a6915232a88)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Add peek_file* functions
Max Reitz [Fri, 11 Oct 2019 15:28:13 +0000 (17:28 +0200)] 
iotests: Add peek_file* functions

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20191011152814.14791-16-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit fc8ba423ca6b37bee56ec9dc339b44043c39553d)
*prereq for 570542ec
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Add test for 4G+ compressed qcow2 write
Max Reitz [Mon, 28 Oct 2019 16:18:41 +0000 (17:18 +0100)] 
iotests: Add test for 4G+ compressed qcow2 write

Test what qemu-img check says about an image after one has written
compressed data to an offset above 4 GB.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191028161841.1198-3-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit b7cd2c11f76d27930f53d3cf26d7b695c78d613b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoqcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK
Max Reitz [Mon, 28 Oct 2019 16:18:40 +0000 (17:18 +0100)] 
qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK

Masks for L2 table entries should have 64 bit.

Fixes: b6c246942b14d3e0dec46a6c5868ed84e7dbea19
Buglink: https://bugs.launchpad.net/qemu/+bug/1850000
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20191028161841.1198-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 24552feb6ae2f615b76c2b95394af43901f75046)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agovirtio-blk: Cancel the pending BH when the dataplane is reset
Philippe Mathieu-Daudé [Fri, 16 Aug 2019 17:15:03 +0000 (19:15 +0200)] 
virtio-blk: Cancel the pending BH when the dataplane is reset

When 'system_reset' is called, the main loop clear the memory
region cache before the BH has a chance to execute. Later when
the deferred function is called, some assumptions that were
made when scheduling them are no longer true when they actually
execute.

This is what happens using a virtio-blk device (fresh RHEL7.8 install):

 $ (sleep 12.3; echo system_reset; sleep 12.3; echo system_reset; sleep 1; echo q) \
   | qemu-system-x86_64 -m 4G -smp 8 -boot menu=on \
     -device virtio-blk-pci,id=image1,drive=drive_image1 \
     -drive file=/var/lib/libvirt/images/rhel78.qcow2,if=none,id=drive_image1,format=qcow2,cache=none \
     -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
     -netdev tap,id=net0,script=/bin/true,downscript=/bin/true,vhost=on \
     -monitor stdio -serial null -nographic
  (qemu) system_reset
  (qemu) system_reset
  (qemu) qemu-system-x86_64: hw/virtio/virtio.c:225: vring_get_region_caches: Assertion `caches != NULL' failed.
  Aborted

  (gdb) bt
  Thread 1 (Thread 0x7f109c17b680 (LWP 10939)):
  #0  0x00005604083296d1 in vring_get_region_caches (vq=0x56040a24bdd0) at hw/virtio/virtio.c:227
  #1  0x000056040832972b in vring_avail_flags (vq=0x56040a24bdd0) at hw/virtio/virtio.c:235
  #2  0x000056040832d13d in virtio_should_notify (vdev=0x56040a240630, vq=0x56040a24bdd0) at hw/virtio/virtio.c:1648
  #3  0x000056040832d1f8 in virtio_notify_irqfd (vdev=0x56040a240630, vq=0x56040a24bdd0) at hw/virtio/virtio.c:1662
  #4  0x00005604082d213d in notify_guest_bh (opaque=0x56040a243ec0) at hw/block/dataplane/virtio-blk.c:75
  #5  0x000056040883dc35 in aio_bh_call (bh=0x56040a243f10) at util/async.c:90
  #6  0x000056040883dccd in aio_bh_poll (ctx=0x560409161980) at util/async.c:118
  #7  0x0000560408842af7 in aio_dispatch (ctx=0x560409161980) at util/aio-posix.c:460
  #8  0x000056040883e068 in aio_ctx_dispatch (source=0x560409161980, callback=0x0, user_data=0x0) at util/async.c:261
  #9  0x00007f10a8fca06d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
  #10 0x0000560408841445 in glib_pollfds_poll () at util/main-loop.c:215
  #11 0x00005604088414bf in os_host_main_loop_wait (timeout=0) at util/main-loop.c:238
  #12 0x00005604088415c4 in main_loop_wait (nonblocking=0) at util/main-loop.c:514
  #13 0x0000560408416b1e in main_loop () at vl.c:1923
  #14 0x000056040841e0e8 in main (argc=20, argv=0x7ffc2c3f9c58, envp=0x7ffc2c3f9d00) at vl.c:4578

Fix this by cancelling the BH when the virtio dataplane is stopped.

[This is version of the patch was modified as discussed with Philippe on
the mailing list thread.
--Stefan]

Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: https://bugs.launchpad.net/qemu/+bug/1839428
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190816171503.24761-1-philmd@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ebb6ff25cd888a52a64a9adc3692541c6d1d9a42)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoscsi: lsi: exit infinite loop while executing script (CVE-2019-12068)
Paolo Bonzini [Wed, 14 Aug 2019 12:05:21 +0000 (17:35 +0530)] 
scsi: lsi: exit infinite loop while executing script (CVE-2019-12068)

When executing script in lsi_execute_script(), the LSI scsi adapter
emulator advances 's->dsp' index to read next opcode. This can lead
to an infinite loop if the next opcode is empty. Move the existing
loop exit after 10k iterations so that it covers no-op opcodes as
well.

Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit de594e47659029316bbf9391efb79da0a1a08e08)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agotarget/xtensa: regenerate and re-import test_mmuhifi_c3 core
Max Filippov [Wed, 9 Oct 2019 02:03:33 +0000 (19:03 -0700)] 
target/xtensa: regenerate and re-import test_mmuhifi_c3 core

Overlay part of the test_mmuhifi_c3 core has GPL3 copyright headers in
it. Fix that by regenerating test_mmuhifi_c3 core overlay and
re-importing it.

Fixes: d848ea776728 ("target/xtensa: add test_mmuhifi_c3 core")
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
(cherry picked from commit d5eaec84e592bb0085f84bef54d0a41e31faa99a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agotarget/arm: Allow reading flags from FPSCR for M-profile
Christophe Lyon [Fri, 25 Oct 2019 09:57:11 +0000 (11:57 +0200)] 
target/arm: Allow reading flags from FPSCR for M-profile

rt==15 is a special case when reading the flags: it means the
destination is APSR. This patch avoids rejecting
vmrs apsr_nzcv, fpscr
as illegal instruction.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christophe Lyon <christophe.lyon@linaro.org>
Message-id: 20191025095711.10853-1-christophe.lyon@linaro.org
[PMM: updated the comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 2529ab43b8a05534494704e803e0332d111d8b91)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agohbitmap: handle set/reset with zero length
Vladimir Sementsov-Ogievskiy [Fri, 11 Oct 2019 09:07:07 +0000 (12:07 +0300)] 
hbitmap: handle set/reset with zero length

Passing zero length to these functions leads to unpredicted results.
Zero-length set/reset may occur in active-mirror, on zero-length write
(which is unlikely, but not guaranteed to never happen).

Let's just do nothing on zero-length request.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191011090711.19940-2-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit fed33bd175f663cc8c13f8a490a4f35a19756cfe)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoutil/hbitmap: strict hbitmap_reset
Vladimir Sementsov-Ogievskiy [Tue, 6 Aug 2019 15:26:11 +0000 (18:26 +0300)] 
util/hbitmap: strict hbitmap_reset

hbitmap_reset has an unobvious property: it rounds requested region up.
It may provoke bugs, like in recently fixed write-blocking mode of
mirror: user calls reset on unaligned region, not keeping in mind that
there are possible unrelated dirty bytes, covered by rounded-up region
and information of this unrelated "dirtiness" will be lost.

Make hbitmap_reset strict: assert that arguments are aligned, allowing
only one exception when @start + @count == hb->orig_size. It's needed
to comfort users of hbitmap_next_dirty_area, which cares about
hb->orig_size.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20190806152611.280389-1-vsementsov@virtuozzo.com>
[Maintainer edit: Max's suggestions from on-list. --js]
[Maintainer edit: Eric's suggestion for aligned macro. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 48557b138383aaf69c2617ca9a88bfb394fc50ec)
*prereq for fed33bd175f663cc8c13f8a490a4f35a19756cfe
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoCOLO-compare: Fix incorrect `if` logic
Fan Yang [Tue, 24 Sep 2019 14:08:29 +0000 (22:08 +0800)] 
COLO-compare: Fix incorrect `if` logic

'colo_mark_tcp_pkt' should return 'true' when packets are the same, and
'false' otherwise.  However, it returns 'true' when
'colo_compare_packet_payload' returns non-zero while
'colo_compare_packet_payload' is just a 'memcmp'.  The result is that
COLO-compare reports inconsistent TCP packets when they are actually
the same.

Fixes: f449c9e549c ("colo: compare the packet based on the tcp sequence number")
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1e907a32b77e5d418538453df5945242e43224fa)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agovirtio-net: prevent offloads reset on migration
Mikhail Sennikovsky [Fri, 11 Oct 2019 13:58:04 +0000 (15:58 +0200)] 
virtio-net: prevent offloads reset on migration

Currently offloads disabled by guest via the VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
command are not preserved on VM migration.
Instead all offloads reported by guest features (via VIRTIO_PCI_GUEST_FEATURES)
get enabled.
What happens is: first the VirtIONet::curr_guest_offloads gets restored and offloads
are getting set correctly:

 #0  qemu_set_offload (nc=0x555556a11400, csum=1, tso4=0, tso6=0, ecn=0, ufo=0) at net/net.c:474
 #1  virtio_net_apply_guest_offloads (n=0x555557701ca0) at hw/net/virtio-net.c:720
 #2  virtio_net_post_load_device (opaque=0x555557701ca0, version_id=11) at hw/net/virtio-net.c:2334
 #3  vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577c80 <vmstate_virtio_net_device>, opaque=0x555557701ca0, version_id=11)
     at migration/vmstate.c:168
 #4  virtio_load (vdev=0x555557701ca0, f=0x5555569dc010, version_id=11) at hw/virtio/virtio.c:2197
 #5  virtio_device_get (f=0x5555569dc010, opaque=0x555557701ca0, size=0, field=0x55555668cd00 <__compound_literal.5>) at hw/virtio/virtio.c:2036
 #6  vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577ce0 <vmstate_virtio_net>, opaque=0x555557701ca0, version_id=11) at migration/vmstate.c:143
 #7  vmstate_load (f=0x5555569dc010, se=0x5555578189e0) at migration/savevm.c:829
 #8  qemu_loadvm_section_start_full (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2211
 #9  qemu_loadvm_state_main (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2395
 #10 qemu_loadvm_state (f=0x5555569dc010) at migration/savevm.c:2467
 #11 process_incoming_migration_co (opaque=0x0) at migration/migration.c:449

However later on the features are getting restored, and offloads get reset to
everything supported by features:

 #0  qemu_set_offload (nc=0x555556a11400, csum=1, tso4=1, tso6=1, ecn=0, ufo=0) at net/net.c:474
 #1  virtio_net_apply_guest_offloads (n=0x555557701ca0) at hw/net/virtio-net.c:720
 #2  virtio_net_set_features (vdev=0x555557701ca0, features=5104441767) at hw/net/virtio-net.c:773
 #3  virtio_set_features_nocheck (vdev=0x555557701ca0, val=5104441767) at hw/virtio/virtio.c:2052
 #4  virtio_load (vdev=0x555557701ca0, f=0x5555569dc010, version_id=11) at hw/virtio/virtio.c:2220
 #5  virtio_device_get (f=0x5555569dc010, opaque=0x555557701ca0, size=0, field=0x55555668cd00 <__compound_literal.5>) at hw/virtio/virtio.c:2036
 #6  vmstate_load_state (f=0x5555569dc010, vmsd=0x555556577ce0 <vmstate_virtio_net>, opaque=0x555557701ca0, version_id=11) at migration/vmstate.c:143
 #7  vmstate_load (f=0x5555569dc010, se=0x5555578189e0) at migration/savevm.c:829
 #8  qemu_loadvm_section_start_full (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2211
 #9  qemu_loadvm_state_main (f=0x5555569dc010, mis=0x5555569eee20) at migration/savevm.c:2395
 #10 qemu_loadvm_state (f=0x5555569dc010) at migration/savevm.c:2467
 #11 process_incoming_migration_co (opaque=0x0) at migration/migration.c:449

Fix this by preserving the state in saved_guest_offloads field and
pushing out offload initialization to the new post load hook.

Cc: qemu-stable@nongnu.org
Signed-off-by: Mikhail Sennikovsky <mikhail.sennikovskii@cloud.ionos.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 7788c3f2e21e35902d45809b236791383bbb613e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agovirtio: new post_load hook
Michael S. Tsirkin [Fri, 11 Oct 2019 13:58:03 +0000 (15:58 +0200)] 
virtio: new post_load hook

Post load hook in virtio vmsd is called early while device is processed,
and when VirtIODevice core isn't fully initialized.  Most device
specific code isn't ready to deal with a device in such state, and
behaves weirdly.

Add a new post_load hook in a device class instead.  Devices should use
this unless they specifically want to verify the migration stream as
it's processed, e.g. for bounds checking.

Cc: qemu-stable@nongnu.org
Suggested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mikhail Sennikovsky <mikhail.sennikovskii@cloud.ionos.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1dd713837cac8ec5a97d3b8492d72ce5ac94803c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoui: Fix hanging up Cocoa display on macOS 10.15 (Catalina)
Hikaru Nishida [Tue, 15 Oct 2019 01:07:34 +0000 (10:07 +0900)] 
ui: Fix hanging up Cocoa display on macOS 10.15 (Catalina)

macOS API documentation says that before applicationDidFinishLaunching
is called, any events will not be processed. However, some events are
fired before it is called in macOS Catalina. This causes deadlock of
iothread_lock in handleEvent while it will be released after the
app_started_sem is posted.
This patch avoids processing events before the app_started_sem is
posted to prevent this deadlock.

Buglink: https://bugs.launchpad.net/qemu/+bug/1847906
Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com>
Message-id: 20191015010734.85229-1-hikarupsp@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit dff742ad27efa474ec04accdbf422c9acfd3e30e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agomirror: Do not dereference invalid pointers
Max Reitz [Mon, 14 Oct 2019 15:39:28 +0000 (17:39 +0200)] 
mirror: Do not dereference invalid pointers

mirror_exit_common() may be called twice (if it is called from
mirror_prepare() and fails, it will be called from mirror_abort()
again).

In such a case, many of the pointers in the MirrorBlockJob object will
already be freed.  This can be seen most reliably for s->target, which
is set to NULL (and then dereferenced by blk_bs()).

Cc: qemu-stable@nongnu.org
Fixes: 737efc1eda23b904fbe0e66b37715fb0e5c3e58b
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20191014153931.20699-2-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit f93c3add3a773e0e3f6277e5517583c4ad3a43c2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Test large write request to qcow2 file
Max Reitz [Thu, 10 Oct 2019 10:08:58 +0000 (12:08 +0200)] 
iotests: Test large write request to qcow2 file

Without HEAD^, the following happens when you attempt a large write
request to a qcow2 file such that the number of bytes covered by all
clusters involved in a single allocation will exceed INT_MAX:

(A) handle_alloc_space() decides to fill the whole area with zeroes and
    fails because bdrv_co_pwrite_zeroes() fails (the request is too
    large).

(B) If handle_alloc_space() does not do anything, but merge_cow()
    decides that the requests can be merged, it will create a too long
    IOV that later cannot be written.

(C) Otherwise, all parts will be written separately, so those requests
    will work.

In either B or C, though, qcow2_alloc_cluster_link_l2() will have an
overflow: We use an int (i) to iterate over nb_clusters, and then
calculate the L2 entry based on "i << s->cluster_bits" -- which will
overflow if the range covers more than INT_MAX bytes.  This then leads
to image corruption because the L2 entry will be wrong (it will be
recognized as a compressed cluster).

Even if that were not the case, the .cow_end area would be empty
(because handle_alloc() will cap avail_bytes and nb_bytes at INT_MAX, so
their difference (which is the .cow_end size) will be 0).

So this test checks that on such large requests, the image will not be
corrupted.  Unfortunately, we cannot check whether COW will be handled
correctly, because that data is discarded when it is written to null-co
(but we have to use null-co, because writing 2 GB of data in a test is
not quite reasonable).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a1406a9262a087d9ec9627b88da13c4590b61dae)
 Conflicts:
tests/qemu-iotests/group
*drop context dep. on tests not in 4.1
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoqcow2: Limit total allocation range to INT_MAX
Max Reitz [Thu, 10 Oct 2019 10:08:57 +0000 (12:08 +0200)] 
qcow2: Limit total allocation range to INT_MAX

When the COW areas are included, the size of an allocation can exceed
INT_MAX.  This is kind of limited by handle_alloc() in that it already
caps avail_bytes at INT_MAX, but the number of clusters still reflects
the original length.

This can have all sorts of effects, ranging from the storage layer write
call failing to image corruption.  (If there were no image corruption,
then I suppose there would be data loss because the .cow_end area is
forced to be empty, even though there might be something we need to
COW.)

Fix all of it by limiting nb_clusters so the equivalent number of bytes
will not exceed INT_MAX.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d1b9d19f99586b33795e20a79f645186ccbc070f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agohw/core/loader: Fix possible crash in rom_copy()
Thomas Huth [Wed, 25 Sep 2019 12:16:43 +0000 (14:16 +0200)] 
hw/core/loader: Fix possible crash in rom_copy()

Both, "rom->addr" and "addr" are derived from the binary image
that can be loaded with the "-kernel" paramer. The code in
rom_copy() then calculates:

    d = dest + (rom->addr - addr);

and uses "d" as destination in a memcpy() some lines later. Now with
bad kernel images, it is possible that rom->addr is smaller than addr,
thus "rom->addr - addr" gets negative and the memcpy() then tries to
copy contents from the image to a bad memory location. This could
maybe be used to inject code from a kernel image into the QEMU binary,
so we better fix it with an additional sanity check here.

Cc: qemu-stable@nongnu.org
Reported-by: Guangming Liu
Buglink: https://bugs.launchpad.net/qemu/+bug/1844635
Message-Id: <20190925130331.27825-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit e423455c4f23a1a828901c78fe6d03b7dde79319)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agovhost-user: save features if the char dev is closed
Adrian Moreno [Tue, 24 Sep 2019 16:20:44 +0000 (18:20 +0200)] 
vhost-user: save features if the char dev is closed

That way the state can be correctly restored when the device is opened
again. This might happen if the backend is restarted.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1738768
Reported-by: Pei Zhang <pezhang@redhat.com>
Fixes: 6ab79a20af3a ("do not call vhost_net_cleanup() on running net from char user event")
Cc: ddstreet@canonical.com
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Message-Id: <20190924162044.11414-1-amorenoz@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit c6beefd674fff8d41b90365dfccad32e53a5abcb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Test internal snapshots with -blockdev
Kevin Wolf [Tue, 17 Sep 2019 10:43:42 +0000 (12:43 +0200)] 
iotests: Test internal snapshots with -blockdev

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
(cherry picked from commit 92b22e7b1789b0e5f20d245706e72eae70dbddce)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/snapshot: Restrict set of snapshot nodes
Kevin Wolf [Tue, 17 Sep 2019 10:26:23 +0000 (12:26 +0200)] 
block/snapshot: Restrict set of snapshot nodes

Nodes involved in internal snapshots were those that were returned by
bdrv_next(), inserted and not read-only. bdrv_next() in turn returns all
nodes that are either the root node of a BlockBackend or monitor-owned
nodes.

With the typical -drive use, this worked well enough. However, in the
typical -blockdev case, the user defines one node per option, making all
nodes monitor-owned nodes. This includes protocol nodes etc. which often
are not snapshottable, so "savevm" only returns an error.

Change the conditions so that internal snapshot still include all nodes
that have a BlockBackend attached (we definitely want to snapshot
anything attached to a guest device and probably also the built-in NBD
server; snapshotting block job BlockBackends is more of an accident, but
a preexisting one), but other monitor-owned nodes are only included if
they have no parents.

This makes internal snapshots usable again with typical -blockdev
configurations.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Tested-by: Peter Krempa <pkrempa@redhat.com>
(cherry picked from commit 05f4aced658a02b02d3e89a6c7a2281008fcf26c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agos390: PCI: fix IOMMU region init
Matthew Rosato [Thu, 26 Sep 2019 14:10:36 +0000 (10:10 -0400)] 
s390: PCI: fix IOMMU region init

The fix in dbe9cf606c shrinks the IOMMU memory region to a size
that seems reasonable on the surface, however is actually too
small as it is based against a 0-mapped address space.  This
causes breakage with small guests as they can overrun the IOMMU window.

Let's go back to the prior method of initializing iommu for now.

Fixes: dbe9cf606c ("s390x/pci: Set the iommu region size mpcifc request")
Cc: qemu-stable@nongnu.org
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reported-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Tested-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
Reported-by: Stefan Zimmerman <stzi@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <1569507036-15314-1-git-send-email-mjrosato@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 7df1dac5f1c85312474df9cb3a8fcae72303da62)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoroms/Makefile.edk2: don't pull in submodules when building from tarball
Michael Roth [Thu, 12 Sep 2019 23:12:02 +0000 (18:12 -0500)] 
roms/Makefile.edk2: don't pull in submodules when building from tarball

Currently the `make efi` target pulls submodules nested under the
roms/edk2 submodule as dependencies. However, when we attempt to build
from a tarball this fails since we are no longer in a git tree.

A preceding patch will pre-populate these submodules in the tarball,
so assume this build dependency is only needed when building from a
git tree.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Bruce Rogers <brogers@suse.com>
Cc: qemu-stable@nongnu.org # v4.1.0
Reported-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-3-mdroth@linux.vnet.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit f3e330e3c319160ac04954399b5a10afc965098c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agomake-release: pull in edk2 submodules so we can build it from tarballs
Michael Roth [Thu, 12 Sep 2019 23:12:01 +0000 (18:12 -0500)] 
make-release: pull in edk2 submodules so we can build it from tarballs

The `make efi` target added by 536d2173 is built from the roms/edk2
submodule, which in turn relies on additional submodules nested under
roms/edk2.

The make-release script currently only pulls in top-level submodules,
so these nested submodules are missing in the resulting tarball.

We could try to address this situation more generally by recursively
pulling in all submodules, but this doesn't necessarily ensure the
end-result will build properly (this case also required other changes).

Additionally, due to the nature of submodules, we may not always have
control over how these sorts of things are dealt with, so for now we
continue to handle it on a case-by-case in the make-release script.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Bruce Rogers <brogers@suse.com>
Cc: qemu-stable@nongnu.org # v4.1.0
Reported-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190912231202.12327-2-mdroth@linux.vnet.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit 45c61c6c23918e3b05ed9ecac5b2328ebae5f774)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agohw/arm/boot.c: Set NSACR.{CP11,CP10} for NS kernel boots
Peter Maydell [Fri, 20 Sep 2019 17:40:39 +0000 (18:40 +0100)] 
hw/arm/boot.c: Set NSACR.{CP11,CP10} for NS kernel boots

If we're booting a Linux kernel directly into Non-Secure
state on a CPU which has Secure state, then make sure we
set the NSACR CP11 and CP10 bits, so that Non-Secure is allowed
to access the FPU. Otherwise an AArch32 kernel will UNDEF as
soon as it tries to use the FPU.

It used to not matter that we didn't do this until commit
fc1120a7f5f2d4b6, where we implemented actually honouring
these NSACR bits.

The problem only exists for CPUs where EL3 is AArch32; the
equivalent AArch64 trap bits are in CPTR_EL3 and are "0 to
not trap, 1 to trap", so the reset value of the register
permits NS access, unlike NSACR.

Fixes: fc1120a7f5
Fixes: https://bugs.launchpad.net/qemu/+bug/1844597
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190920174039.3916-1-peter.maydell@linaro.org
(cherry picked from commit ece628fcf69cbbd4b3efb6fbd203af07609467a2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/backup: fix backup_cow_with_offload for last cluster
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:43 +0000 (17:20 +0300)] 
block/backup: fix backup_cow_with_offload for last cluster

We shouldn't try to copy bytes beyond EOF. Fix it.

Fixes: 9ded4a0114968e
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190920142056.12778-3-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 1048ddf0a32dcdaa952e581bd503d49adad527cc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/backup: fix max_transfer handling for copy_range
Vladimir Sementsov-Ogievskiy [Fri, 20 Sep 2019 14:20:42 +0000 (17:20 +0300)] 
block/backup: fix max_transfer handling for copy_range

Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
are trying to find aligned size which satisfy both source and target.
Also, don't ignore too small max_transfer. In this case seems safer to
disable copy_range.

Fixes: 9ded4a0114968e
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190920142056.12778-2-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 981fb5810aa3f68797ee6e261db338bd78857614)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoqcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
Kevin Wolf [Thu, 24 Oct 2019 14:26:58 +0000 (16:26 +0200)] 
qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()

qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
requires s->lock to be taken to protect its accesses to the refcount
table and refcount blocks. However, nothing in this code path actually
took the lock. This could cause the same cache entry to be used by two
requests at the same time, for different tables at different offsets,
resulting in image corruption.

As it would be preferable to base the detection on consistent data (even
though it's just heuristics), let's take the lock not only around the
qcow2_get_refcount() calls, but around the whole function.

This patch takes the lock in qcow2_co_block_status() earlier and asserts
in qcow2_detect_metadata_preallocation() that we hold the lock.

Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
Cc: qemu-stable@nongnu.org
Reported-by: Michael Weiser <michael.weiser@gmx.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 5e9785505210e2477e590e61b1ab100d0ec22b01)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocoroutine: Add qemu_co_mutex_assert_locked()
Kevin Wolf [Thu, 24 Oct 2019 14:26:57 +0000 (16:26 +0200)] 
coroutine: Add qemu_co_mutex_assert_locked()

Some functions require that the caller holds a certain CoMutex for them
to operate correctly. Add a function so that they can assert the lock is
really held.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Michael Weiser <michael.weiser@gmx.de>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 944f3d5dd216fcd8cb007eddd4f82dced0a15b3d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/qcow2: Fix corruption introduced by commit 8ac0f15f335
Maxim Levitsky [Sun, 15 Sep 2019 20:36:53 +0000 (23:36 +0300)] 
block/qcow2: Fix corruption introduced by commit 8ac0f15f335

This fixes subtle corruption introduced by luks threaded encryption
in commit 8ac0f15f335

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1745922

The corruption happens when we do a write that
   * writes to two or more unallocated clusters at once
   * doesn't fully cover the first sector
   * doesn't fully cover the last sector
   * uses luks encryption

In this case, when allocating the new clusters we COW both areas
prior to the write and after the write, and we encrypt them.

The above mentioned commit accidentally made it so we encrypt the
second COW area using the physical cluster offset of the first area.

The problem is that offset_in_cluster in do_perform_cow_encrypt
can be larger that the cluster size, thus cluster_offset
will no longer point to the start of the cluster at which encrypted
area starts.

Next patch in this series will refactor the code to avoid all these
assumptions.

In the bugreport that was triggered by rebasing a luks image to new,
zero filled base, which lot of such writes, and causes some files
with zero areas to contain garbage there instead.
But as described above it can happen elsewhere as well

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190915203655.21638-2-mlevitsk@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 38e7d54bdc518b5a05a922467304bcace2396945)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblockjob: update nodes head while removing all bdrv
Sergio Lopez [Wed, 11 Sep 2019 10:03:16 +0000 (12:03 +0200)] 
blockjob: update nodes head while removing all bdrv

block_job_remove_all_bdrv() iterates through job->nodes, calling
bdrv_root_unref_child() for each entry. The call to the latter may
reach child_job_[can_]set_aio_ctx(), which will also attempt to
traverse job->nodes, potentially finding entries that where freed
on previous iterations.

To avoid this situation, update job->nodes head on each iteration to
ensure that already freed entries are no longer linked to the list.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1746631
Signed-off-by: Sergio Lopez <slp@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190911100316.32282-1-mreitz@redhat.com
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit d876bf676f5e7c6aa9ac64555e48cba8734ecb2f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocurl: Handle success in multi_check_completion
Max Reitz [Tue, 10 Sep 2019 12:41:35 +0000 (14:41 +0200)] 
curl: Handle success in multi_check_completion

Background: As of cURL 7.59.0, it verifies that several functions are
not called from within a callback.  Among these functions is
curl_multi_add_handle().

curl_read_cb() is a callback from cURL and not a coroutine.  Waking up
acb->co will lead to entering it then and there, which means the current
request will settle and the caller (if it runs in the same coroutine)
may then issue the next request.  In such a case, we will enter
curl_setup_preadv() effectively from within curl_read_cb().

Calling curl_multi_add_handle() will then fail and the new request will
not be processed.

Fix this by not letting curl_read_cb() wake up acb->co.  Instead, leave
the whole business of settling the AIOCB objects to
curl_multi_check_completion() (which is called from our timer callback
and our FD handler, so not from any cURL callbacks).

Reported-by: Natalie Gavrielov <ngavrilo@redhat.com>
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1740193
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-7-mreitz@redhat.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit bfb23b480a49114315877aacf700b49453e0f9d9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocurl: Report only ready sockets
Max Reitz [Tue, 10 Sep 2019 12:41:34 +0000 (14:41 +0200)] 
curl: Report only ready sockets

Instead of reporting all sockets to cURL, only report the one that has
caused curl_multi_do_locked() to be called.  This lets us get rid of the
QLIST_FOREACH_SAFE() list, which was actually wrong: SAFE foreaches are
only safe when the current element is removed in each iteration.  If it
possible for the list to be concurrently modified, we cannot guarantee
that only the current element will be removed.  Therefore, we must not
use QLIST_FOREACH_SAFE() here.

Fixes: ff5ca1664af85b24a4180d595ea6873fd3deac57
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-6-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 9abaf9fc474c3dd53e8e119326abc774c977c331)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocurl: Pass CURLSocket to curl_multi_do()
Max Reitz [Tue, 10 Sep 2019 12:41:33 +0000 (14:41 +0200)] 
curl: Pass CURLSocket to curl_multi_do()

curl_multi_do_locked() currently marks all sockets as ready.  That is
not only inefficient, but in fact unsafe (the loop is).  A follow-up
patch will change that, but to do so, curl_multi_do_locked() needs to
know exactly which socket is ready; and that is accomplished by this
patch here.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-5-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 9dbad87d25587ff640ef878f7b6159fc368ff541)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocurl: Check completion in curl_multi_do()
Max Reitz [Tue, 10 Sep 2019 12:41:32 +0000 (14:41 +0200)] 
curl: Check completion in curl_multi_do()

While it is more likely that transfers complete after some file
descriptor has data ready to read, we probably should not rely on it.
Better be safe than sorry and call curl_multi_check_completion() in
curl_multi_do(), too, just like it is done in curl_multi_read().

With this change, curl_multi_do() and curl_multi_read() are actually the
same, so drop curl_multi_read() and use curl_multi_do() as the sole FD
handler.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-4-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 948403bcb1c7e71dcbe8ab8479cf3934a0efcbb5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocurl: Keep *socket until the end of curl_sock_cb()
Max Reitz [Tue, 10 Sep 2019 12:41:31 +0000 (14:41 +0200)] 
curl: Keep *socket until the end of curl_sock_cb()

This does not really change anything, but it makes the code a bit easier
to follow once we use @socket as the opaque pointer for
aio_set_fd_handler().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190910124136.10565-3-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 007f339b1099af46a008dac438ca0943e31dba72)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agocurl: Keep pointer to the CURLState in CURLSocket
Max Reitz [Tue, 10 Sep 2019 12:41:30 +0000 (14:41 +0200)] 
curl: Keep pointer to the CURLState in CURLSocket

A follow-up patch will make curl_multi_do() and curl_multi_read() take a
CURLSocket instead of the CURLState.  They still need the latter,
though, so add a pointer to it to the former.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20190910124136.10565-2-mreitz@redhat.com
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 0487861685294660b23bc146e1ebd5304aa8bbe0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/nfs: tear down aio before nfs_close
Peter Lieven [Tue, 10 Sep 2019 15:41:09 +0000 (17:41 +0200)] 
block/nfs: tear down aio before nfs_close

nfs_close is a sync call from libnfs and has its own event
handler polling on the nfs FD. Avoid that both QEMU and libnfs
are intefering here.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 601dc6559725f7a614b6f893611e17ff0908e914)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoqcow2: Fix the calculation of the maximum L2 cache size
Alberto Garcia [Fri, 16 Aug 2019 12:17:42 +0000 (15:17 +0300)] 
qcow2: Fix the calculation of the maximum L2 cache size

The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
larger than the maximum amount of L2 metadata that the image can have.
For example: with 64 KB clusters the user would need a qcow2 image
with a virtual size of 256 GB in order to have 32 MB of L2 metadata.

Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
we forbid the L2 cache to become larger than the maximum amount of L2
metadata for the image, calculated using this formula:

    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);

The problem with this formula is that the result should be rounded up
to the cluster size because an L2 table on disk always takes one full
cluster.

For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
the last 32 KB of those are not going to be used.

However QEMU rounds the numbers down and only creates 2 cache tables
(128 KB), which is not enough for the image.

A quick test doing 4KB random writes on a 1280 MB image gives me
around 500 IOPS, while with the correct cache size I get 16K IOPS.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b70d08205b2e4044c529eefc21df2c8ab61b473b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agolibvhost-user: fix SLAVE_SEND_FD handling
Johannes Berg [Tue, 3 Sep 2019 20:04:22 +0000 (23:04 +0300)] 
libvhost-user: fix SLAVE_SEND_FD handling

It doesn't look like this could possibly work properly since
VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD is defined to 10, but the
dev->protocol_features has a bitmap. I suppose the peer this
was tested with also supported VHOST_USER_PROTOCOL_F_LOG_SHMFD,
in which case the test would always be false, but nevertheless
the code seems wrong.

Use has_feature() to fix this.

Fixes: d84599f56c82 ("libvhost-user: support host notifier")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Message-Id: <20190903200422.11693-1-johannes@sipsolutions.net>
Reviewed-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 8726b70b449896f1211f869ec4f608904f027207)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agotarget/arm: Don't abort on M-profile exception return in linux-user mode
Peter Maydell [Thu, 22 Aug 2019 13:15:34 +0000 (14:15 +0100)] 
target/arm: Don't abort on M-profile exception return in linux-user mode

An attempt to do an exception-return (branch to one of the magic
addresses) in linux-user mode for M-profile should behave like
a normal branch, because linux-user mode is always going to be
in 'handler' mode. This used to work, but we broke it when we added
support for the M-profile security extension in commit d02a8698d7ae2bfed.

In that commit we allowed even handler-mode calls to magic return
values to be checked for and dealt with by causing an
EXCP_EXCEPTION_EXIT exception to be taken, because this is
needed for the FNC_RETURN return-from-non-secure-function-call
handling. For system mode we added a check in do_v7m_exception_exit()
to make any spurious calls from Handler mode behave correctly, but
forgot that linux-user mode would also be affected.

How an attempted return-from-non-secure-function-call in linux-user
mode should be handled is not clear -- on real hardware it would
result in return to secure code (not to the Linux kernel) which
could then handle the error in any way it chose. For QEMU we take
the simple approach of treating this erroneous return the same way
it would be handled on a CPU without the security extensions --
treat it as a normal branch.

The upshot of all this is that for linux-user mode we should never
do any of the bx_excret magic, so the code change is simple.

This ought to be a weird corner case that only affects broken guest
code (because Linux user processes should never be attempting to do
exception returns or NS function returns), except that the code that
assigns addresses in RAM for the process and stack in our linux-user
code does not attempt to avoid this magic address range, so
legitimate code attempting to return to a trampoline routine on the
stack can fall into this case. This change fixes those programs,
but we should also look at restricting the range of memory we
use for M-profile linux-user guests to the area that would be
real RAM in hardware.

Cc: qemu-stable@nongnu.org
Reported-by: Christophe Lyon <christophe.lyon@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190822131534.16602-1-peter.maydell@linaro.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1840922
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 5e5584c89f36b302c666bc6db535fd3f7ff35ad2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agotarget/arm: Free TCG temps in trans_VMOV_64_sp()
Peter Maydell [Tue, 27 Aug 2019 12:19:31 +0000 (13:19 +0100)] 
target/arm: Free TCG temps in trans_VMOV_64_sp()

The function neon_store_reg32() doesn't free the TCG temp that it
is passed, so the caller must do that. We got this right in most
places but forgot to free the TCG temps in trans_VMOV_64_sp().

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190827121931.26836-1-peter.maydell@linaro.org
(cherry picked from commit 342d27581bd3ecdb995e4fc55fcd383cf3242888)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Test blockdev-create for vpc
Max Reitz [Mon, 2 Sep 2019 19:33:20 +0000 (21:33 +0200)] 
iotests: Test blockdev-create for vpc

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit cb73747e1a47b93d3dfdc3f769c576b053916938)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Restrict nbd Python tests to nbd
Max Reitz [Mon, 2 Sep 2019 19:33:19 +0000 (21:33 +0200)] 
iotests: Restrict nbd Python tests to nbd

We have two Python unittest-style tests that test NBD.  As such, they
should specify supported_protocols=['nbd'] so they are skipped when the
user wants to test some other protocol.

Furthermore, we should restrict their choice of formats to 'raw'.  The
idea of a protocol/format combination is to use some format over some
protocol; but we always use the raw format over NBD.  It does not really
matter what the NBD server uses on its end, and it is not a useful test
of the respective format driver anyway.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7c932a1d69a6d6ac5c0b615c11d191da3bbe9aa8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Restrict file Python tests to file
Max Reitz [Mon, 2 Sep 2019 19:33:18 +0000 (21:33 +0200)] 
iotests: Restrict file Python tests to file

Most of our Python unittest-style tests only support the file protocol.
You can run them with any other protocol, but the test will simply
ignore your choice and use file anyway.

We should let them signal that they require the file protocol so they
are skipped when you want to test some other protocol.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 103cbc771e5660d1f5bb458be80aa9e363547ae0)
 Conflicts:
tests/qemu-iotests/257
*drop changes for tests not in 4.1.0
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Add supported protocols to execute_test()
Max Reitz [Mon, 2 Sep 2019 19:33:17 +0000 (21:33 +0200)] 
iotests: Add supported protocols to execute_test()

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 88d2aa533a4a1aad44a27c2e6cd5bc5fbcbce7ed)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: add testing shim for script-style python tests
John Snow [Mon, 29 Jul 2019 20:35:53 +0000 (16:35 -0400)] 
iotests: add testing shim for script-style python tests

Because the new-style python tests don't use the iotests.main() test
launcher, we don't turn on the debugger logging for these scripts
when invoked via ./check -d.

Refactor the launcher shim into new and old style shims so that they
share environmental configuration.

Two cleanup notes: debug was not actually used as a global, and there
was no reason to create a class in an inner scope just to achieve
default variables; we can simply create an instance of the runner with
the values we want instead.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190709232550.10724-14-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 456a2d5ac7641c7e75c76328a561b528a8607a8e)
*prereq for 88d2aa533a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agovpc: Return 0 from vpc_co_create() on success
Max Reitz [Mon, 2 Sep 2019 19:33:16 +0000 (21:33 +0200)] 
vpc: Return 0 from vpc_co_create() on success

blockdev_create_run() directly uses .bdrv_co_create()'s return value as
the job's return value.  Jobs must return 0 on success, not just any
nonnegative value.  Therefore, using blockdev-create for VPC images may
currently fail as the vpc driver may return a positive integer.

Because there is no point in returning a positive integer anywhere in
the block layer (all non-negative integers are generally treated as
complete success), we probably do not want to add more such cases.
Therefore, fix this problem by making the vpc driver always return 0 in
case of success.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 1a37e3124407b5a145d44478d3ecbdb89c63789f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agox86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set
Igor Mammedov [Mon, 2 Sep 2019 12:02:22 +0000 (08:02 -0400)] 
x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' is not set

Commit 176d2cda0 (i386/cpu: Consolidate die-id validity in smp context) added
new 'die-id' topology property to CPUs and exposed it via QMP command
query-hotpluggable-cpus, which broke -device/device_add cpu-foo for existing
users that do not support die-id/dies yet. That's would be fine if it happened
to new machine type only but it also happened to old machine types,
which breaks migration from old QEMU to the new one, for example following CLI:

  OLD-QEMU -M pc-i440fx-4.0 -smp 1,max_cpus=2 \
           -device qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id
is not able to start with new QEMU, complaining about invalid die-id.

After discovering regression, the patch
   "pc: Don't make die-id mandatory unless necessary"
makes die-id optional so old CLI would work.

However it's not enough as new QEMU still exposes die-id via query-hotpluggbale-cpus
QMP command, so the users that started old machine type on new QEMU, using all
properties (including die-id) received from QMP command (as required), won't be
able to start old QEMU using the same properties since it doesn't support die-id.

Fix it by hiding die-id in query-hotpluggbale-cpus for all machine types in case
'-smp dies' is not provided on CLI or -smp dies = 1', in which case smp_dies == 1
and APIC ID is calculated in default way (as it was before DIE support) so we won't
need compat code as in both cases the topology provided to guest via CPUID is the same.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20190902120222.6179-1-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit c6c1bb89fb46f3b88f832e654cf5a6f7941aac51)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agopr-manager: Fix invalid g_free() crash bug
Markus Armbruster [Thu, 22 Aug 2019 13:38:46 +0000 (15:38 +0200)] 
pr-manager: Fix invalid g_free() crash bug

pr_manager_worker() passes its @opaque argument to g_free().  Wrong;
it points to pr_manager_worker()'s automatic @data.  Broken when
commit 2f3a7ab39be converted @data from heap- to stack-allocated.  Fix
by deleting the g_free().

Fixes: 2f3a7ab39bec4ba8022dc4d42ea641165b004e3e
Cc: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6b9d62c2a9e83bbad73fb61406f0ff69b46ff6f3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoiotests: Test reverse sub-cluster qcow2 writes
Max Reitz [Fri, 23 Aug 2019 13:03:41 +0000 (15:03 +0200)] 
iotests: Test reverse sub-cluster qcow2 writes

This exercises the regression introduced in commit
50ba5b2d994853b38fed10e0841b119da0f8b8e5.  On my machine, it has close
to a 50 % false-negative rate, but that should still be sufficient to
test the fix.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ae6ef0190981a21f2d4bc8dcee7253688f14fae7)
 Conflicts:
tests/qemu-iotests/group
*fix context deps on tests not in 4.1.0
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoblock/file-posix: Reduce xfsctl() use
Max Reitz [Fri, 23 Aug 2019 13:03:40 +0000 (15:03 +0200)] 
block/file-posix: Reduce xfsctl() use

This patch removes xfs_write_zeroes() and xfs_discard().  Both functions
have been added just before the same feature was present through
fallocate():

- fallocate() has supported PUNCH_HOLE for XFS since Linux 2.6.38 (March
  2011); xfs_discard() was added in December 2010.

- fallocate() has supported ZERO_RANGE for XFS since Linux 3.15 (June
  2014); xfs_write_zeroes() was added in November 2013.

Nowadays, all systems that qemu runs on should support both fallocate()
features (RHEL 7's kernel does).

xfsctl() is still useful for getting the request alignment for O_DIRECT,
so this patch does not remove our dependency on it completely.

Note that xfs_write_zeroes() had a bug: It calls ftruncate() when the
file is shorter than the specified range (because ZERO_RANGE does not
increase the file length).  ftruncate() may yield and then discard data
that parallel write requests have written past the EOF in the meantime.
Dropping the function altogether fixes the bug.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 50ba5b2d994853b38fed10e0841b119da0f8b8e5
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoxen-bus: check whether the frontend is active during device reset...
Paul Durrant [Tue, 10 Sep 2019 17:17:53 +0000 (18:17 +0100)] 
xen-bus: check whether the frontend is active during device reset...

...not the backend

Commit cb323146 "xen-bus: Fix backend state transition on device reset"
contained a subtle mistake. The hunk

@@ -539,11 +556,11 @@ static void xen_device_backend_changed(void *opaque)

     /*
      * If the toolstack (or unplug request callback) has set the backend
-     * state to Closing, but there is no active frontend (i.e. the
-     * state is not Connected) then set the backend state to Closed.
+     * state to Closing, but there is no active frontend then set the
+     * backend state to Closed.
      */
     if (xendev->backend_state == XenbusStateClosing &&
-        xendev->frontend_state != XenbusStateConnected) {
+        !xen_device_state_is_active(state)) {
         xen_device_backend_set_state(xendev, XenbusStateClosed);
     }

mistakenly replaced the check of 'xendev->frontend_state' with a check
(now in a helper function) of 'state', which actually equates to
'xendev->backend_state'.

This patch fixes the mistake.

Fixes: cb3231460747552d70af9d546dc53d8195bcb796
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20190910171753.3775-1-paul.durrant@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit df6180bb56cd03949c2c64083da58755fed81a61)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoxen-bus: Fix backend state transition on device reset
Anthony PERARD [Fri, 23 Aug 2019 10:15:33 +0000 (11:15 +0100)] 
xen-bus: Fix backend state transition on device reset

When a frontend wants to reset its state and the backend one, it
starts with setting "Closing", then waits for the backend (QEMU) to do
the same.

But when QEMU is setting "Closing" to its state, it triggers an event
(xenstore watch) that re-execute xen_device_backend_changed() and set
the backend state to "Closed". QEMU should wait for the frontend to
set "Closed" before doing the same.

Before setting "Closed" to the backend_state, we are also going to
check if there is a frontend. If that the case, when the backend state
is set to "Closing" the frontend should react and sets its state to
"Closing" then "Closed". The backend should wait for that to happen.

Fixes: b6af8926fb858c4f1426e5acb2cfc1f0580ec98a
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Message-Id: <20190823101534.465-2-anthony.perard@citrix.com>
(cherry picked from commit cb3231460747552d70af9d546dc53d8195bcb796)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agopc: Don't make die-id mandatory unless necessary
Eduardo Habkost [Fri, 16 Aug 2019 17:07:50 +0000 (14:07 -0300)] 
pc: Don't make die-id mandatory unless necessary

We have this issue reported when using libvirt to hotplug CPUs:
https://bugzilla.redhat.com/show_bug.cgi?id=1741451

Basically, libvirt is not copying die-id from
query-hotpluggable-cpus, but die-id is now mandatory.

We could blame libvirt and say it is not following the documented
interface, because we have this buried in the QAPI schema
documentation:

> Note: currently there are 5 properties that could be present
> but management should be prepared to pass through other
> properties with device_add command to allow for future
> interface extension. This also requires the filed names to be kept in
> sync with the properties passed to -device/device_add.

But I don't think this would be reasonable from us.  We can just
make QEMU more flexible and let die-id to be omitted when there's
no ambiguity.  This will allow us to keep compatibility with
existing libvirt versions.

Test case included to ensure we don't break this again.

Fixes: commit 176d2cda0dee ("i386/cpu: Consolidate die-id validity in smp context")
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20190816170750.23910-1-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit fea374e7c8079563bca7c8fac895c6a880f76adc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agotarget/alpha: fix tlb_fill trap_arg2 value for instruction fetch
Aurelien Jarno [Thu, 22 Aug 2019 17:45:14 +0000 (10:45 -0700)] 
target/alpha: fix tlb_fill trap_arg2 value for instruction fetch

Commit e41c94529740cc26 ("target/alpha: Convert to CPUClass::tlb_fill")
slightly changed the way the trap_arg2 value is computed in case of TLB
fill. The type of the variable used in the ternary operator has been
changed from an int to an enum. This causes the -1 value to not be
sign-extended to 64-bit in case of an instruction fetch. The trap_arg2
ends up with 0xffffffff instead of 0xffffffffffffffff. Fix that by
changing the -1 into -1LL.

This fixes the execution of user space processes in qemu-system-alpha.

Fixes: e41c94529740cc26
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
[rth: Test MMU_DATA_LOAD and MMU_DATA_STORE instead of implying them.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
(cherry picked from commit cb1de55a83eaca9ee32be9c959dca99e11f2fea8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agos390x/tcg: Fix VERIM with 32/64 bit elements
David Hildenbrand [Wed, 14 Aug 2019 15:12:42 +0000 (17:12 +0200)] 
s390x/tcg: Fix VERIM with 32/64 bit elements

Wrong order of operands. The constant always comes last. Makes QEMU crash
reliably on specific git fetch invocations.

Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190814151242.27199-1-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Fixes: 5c4b0ab460ef ("s390x/tcg: Implement VECTOR ELEMENT ROTATE AND INSERT UNDER MASK")
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 25bcb45d1b81d22634daa2b1a2d8bee746ac129b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agoRevert "ide/ahci: Check for -ECANCELED in aio callbacks"
John Snow [Mon, 29 Jul 2019 22:36:05 +0000 (18:36 -0400)] 
Revert "ide/ahci: Check for -ECANCELED in aio callbacks"

This reverts commit 0d910cfeaf2076b116b4517166d5deb0fea76394.

It's not correct to just ignore an error code in a callback; we need to
handle that error and possible report failure to the guest so that they
don't wait indefinitely for an operation that will now never finish.

This ought to help cases reported by Nutanix where iSCSI returns a
legitimate -ECANCELED for certain operations which should be propagated
normally.

Reported-by: Shaju Abraham <shaju.abraham@nutanix.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 20190729223605.7163-1-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 8ec41c4265714255d5a138f8b538faf3583dcff6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
14 months agodma-helpers: ensure AIO callback is invoked after cancellation
Paolo Bonzini [Mon, 29 Jul 2019 21:34:16 +0000 (23:34 +0200)] 
dma-helpers: ensure AIO callback is invoked after cancellation

dma_aio_cancel unschedules the BH if there is one, which corresponds
to the reschedule_dma case of dma_blk_cb.  This can stall the DMA
permanently, because dma_complete will never get invoked and therefore
nobody will ever invoke the original AIO callback in dbs->common.cb.

Fix this by invoking the callback (which is ensured to happen after
a bdrv_aio_cancel_async, or done manually in the dbs->bh case), and
add assertions to check that the DMA state machine is indeed waiting
for dma_complete or reschedule_dma, but never both.

Reported-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20190729213416.1972-1-pbonzini@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 539343c0a47e19d5dd64d846d64d084d9793681f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
17 months agoUpdate version for v4.1.0 release v4.1.0
Peter Maydell [Thu, 15 Aug 2019 12:03:37 +0000 (13:03 +0100)] 
Update version for v4.1.0 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoUpdate version for v4.1.0-rc5 release v4.1.0-rc5
Peter Maydell [Tue, 13 Aug 2019 14:38:38 +0000 (15:38 +0100)] 
Update version for v4.1.0-rc5 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoriscv: roms: Fix make rules for building sifive_u bios
Bin Meng [Sat, 3 Aug 2019 06:08:04 +0000 (23:08 -0700)] 
riscv: roms: Fix make rules for building sifive_u bios

Currently the make rules are wrongly using qemu/virt opensbi image
for sifive_u machine. Correct it.

Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Chih-Min Chao <chihmin.chao@sifive.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 1564812484-20385-1-git-send-email-bmeng.cn@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190813' into staging
Peter Maydell [Tue, 13 Aug 2019 10:35:30 +0000 (11:35 +0100)] 
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190813' into staging

ppc patch queue 2019-08-13 (last minute qemu-4.1 fixes)

Here's a very, very last minute pull request for qemu-4.1.  This fixes
two nasty bugs with the XIVE interrupt controller in "dual" mode
(where the guest decides which interrupt controller it wants to use).
One occurs when resetting the guest while I/O is active, and the other
with migration of hotplugged CPUs.

The timing here is very unfortunate.  Alas, we only spotted these bugs
very late, and I was sick last week, delaying analysis and fix even
further.

This series hasn't had nearly as much testing as I'd really like, but
I'd still like to squeeze it into qemu-4.1 if possible, since
definitely fixing two bad bugs seems like an acceptable tradeoff for
the risk of introducing different bugs.

# gpg: Signature made Tue 13 Aug 2019 07:56:42 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.1-20190813:
  spapr/xive: Fix migration of hot-plugged CPUs
  spapr: Reset CAS & IRQ subsystem after devices

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agospapr/xive: Fix migration of hot-plugged CPUs
Cédric Le Goater [Tue, 13 Aug 2019 06:48:53 +0000 (08:48 +0200)] 
spapr/xive: Fix migration of hot-plugged CPUs

The migration sequence of a guest using the XIVE exploitation mode
relies on the fact that the states of all devices are restored before
the machine is. This is not true for hot-plug devices such as CPUs
which state come after the machine. This breaks migration because the
thread interrupt context registers are not correctly set.

Fix migration of hotplugged CPUs by restoring their context in the
'post_load' handler of the XiveTCTX model.

Fixes: 277dd3d7712a ("spapr/xive: add migration support for KVM")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190813064853.29310-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 months agospapr: Reset CAS & IRQ subsystem after devices
David Gibson [Tue, 13 Aug 2019 05:59:18 +0000 (15:59 +1000)] 
spapr: Reset CAS & IRQ subsystem after devices

This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
caused by the new "dual" interrupt controller model.  Specifically,
qemu can crash when used with KVM if a 'system_reset' is requested
while there's active I/O in the guest.

The problem is that in spapr_machine_reset() we:

1. Reset the CAS vector state
spapr_ovec_cleanup(spapr->ov5_cas);

2. Reset all devices
qemu_devices_reset()

3. Reset the irq subsystem
spapr_irq_reset();

However (1) implicitly changes the interrupt delivery mode, because
whether we're using XICS or XIVE depends on the CAS state.  We don't
properly initialize the new irq mode until (3) though - in particular
setting up the KVM devices.

During (2), we can temporarily drop the BQL allowing some irqs to be
delivered which will go to an irq system that's not properly set up.

Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
reset will put us back in XICS mode.  kvm_kernel_irqchip() still
returns true, because XIVE was using KVM, however XICs doesn't have
its KVM components intialized and kernel_xics_fd == -1.  When the irq
is delivered it goes via ics_kvm_set_irq() which assert()s that
kernel_xics_fd != -1.

This change addresses the problem by delaying the CAS reset until
after the devices reset.  The device reset should quiesce all the
devices so we won't get irqs delivered while we mess around with the
IRQ.  The CAS reset and irq re-initialize should also now be under the
same BQL critical section so nothing else should be able to interrupt
it either.

We also move the spapr_irq_msi_reset() used in one of the legacy irq
modes, since it logically makes sense at the same point as the
spapr_irq_reset() (it's essentially an equivalent operation for older
machine types).  Since we don't need to switch between different
interrupt controllers for those old machine types it shouldn't
actually be broken in those cases though.

Cc: Cédric Le Goater <clg@kaod.org>
Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
                 XIVE and XICS"
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
17 months agodisplay/bochs: fix pcie support
Gerd Hoffmann [Mon, 12 Aug 2019 06:52:21 +0000 (08:52 +0200)] 
display/bochs: fix pcie support

Set QEMU_PCI_CAP_EXPRESS unconditionally in init(), then clear it in
realize() in case the device is not connected to a PCIe bus.

This makes sure the pci config space allocation is big enough, so
accessing the PCIe extended config space doesn't overflow the pci
config space buffer.

PCI(e) config space is guest writable.  Writes are limited by
write mask (which probably is also filled with random stuff),
so the guest can only flip enabled bits.  But I suspect it
still might be exploitable, so rather serious because it might
be a host escape for the guest.  On the other hand the device
is probably not yet in widespread use.

(For a QEMU version without this commit, a mitigation for the
bug is available: use "-device bochs-display" as a conventional pci
device only.)

Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190812065221.20907-2-kraxel@redhat.com
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoUpdate version for v4.1.0-rc4 release v4.1.0-rc4
Peter Maydell [Tue, 6 Aug 2019 16:05:21 +0000 (17:05 +0100)] 
Update version for v4.1.0-rc4 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agocompat: disable edid on virtio-gpu base device
Cornelia Huck [Tue, 6 Aug 2019 11:58:19 +0000 (13:58 +0200)] 
compat: disable edid on virtio-gpu base device

'edid' is a property of the virtio-gpu base device, so turning
it off on virtio-gpu-pci is not enough (it misses -ccw). Turn
it off on the base device instead.

Fixes: 0a71966253c8 ("edid: flip the default to enabled")
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190806115819.16026-1-cohuck@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoMerge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-06' into staging
Peter Maydell [Tue, 6 Aug 2019 12:40:31 +0000 (13:40 +0100)] 
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-06' into staging

Block patches for 4.1.0-rc4:
- Fix the backup block job when using copy offloading
- Fix the mirror block job when using the write-blocking copy mode
- Fix incremental backups after the image has been grown with the
  respective bitmap attached to it

# gpg: Signature made Tue 06 Aug 2019 12:57:07 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-08-06:
  block/backup: disable copy_range for compressed backup
  iotests: Test unaligned blocking mirror write
  mirror: Only mirror granularity-aligned chunks
  iotests: Test incremental backup after truncation
  util/hbitmap: update orig_size on truncate
  iotests: Test backup job with two guest writes
  backup: Copy only dirty areas

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoblock/backup: disable copy_range for compressed backup
Vladimir Sementsov-Ogievskiy [Tue, 30 Jul 2019 16:32:50 +0000 (19:32 +0300)] 
block/backup: disable copy_range for compressed backup

Enabled by default copy_range ignores compress option. It's definitely
unexpected for user.

It's broken since introduction of copy_range usage in backup in
9ded4a011496.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190730163251.755248-3-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agoiotests: Test unaligned blocking mirror write
Max Reitz [Mon, 5 Aug 2019 11:35:26 +0000 (13:35 +0200)] 
iotests: Test unaligned blocking mirror write

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190805113526.20319-1-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agomirror: Only mirror granularity-aligned chunks
Max Reitz [Mon, 5 Aug 2019 15:33:08 +0000 (17:33 +0200)] 
mirror: Only mirror granularity-aligned chunks

In write-blocking mode, all writes to the top node directly go to the
target.  We must only mirror chunks of data that are aligned to the
job's granularity, because that is how the dirty bitmap works.
Therefore, the request alignment for writes must be the job's
granularity (in write-blocking mode).

Unfortunately, this forces all reads and writes to have the same
granularity (we only need this alignment for writes to the target, not
the source), but that is something to be fixed another time.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190805153308.2657-1-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Fixes: d06107ade0ce74dc39739bac80de84b51ec18546
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agoiotests: Test incremental backup after truncation
Max Reitz [Mon, 5 Aug 2019 15:28:40 +0000 (17:28 +0200)] 
iotests: Test incremental backup after truncation

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190805152840.32190-1-mreitz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agoutil/hbitmap: update orig_size on truncate
Vladimir Sementsov-Ogievskiy [Mon, 5 Aug 2019 12:01:20 +0000 (15:01 +0300)] 
util/hbitmap: update orig_size on truncate

Without this, hbitmap_next_zero and hbitmap_next_dirty_area are broken
after truncate. So, orig_size is broken since it's introduction in
76d570dc495c56bb.

Fixes: 76d570dc495c56bb
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190805120120.23585-1-vsementsov@virtuozzo.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agoiotests: Test backup job with two guest writes
Max Reitz [Thu, 1 Aug 2019 17:39:00 +0000 (19:39 +0200)] 
iotests: Test backup job with two guest writes

Perform two guest writes to not yet backed up areas of an image, where
the former touches an inner area of the latter.

Before HEAD^, copy offloading broke this in two ways:
(1) The target image differs from the reference image (what the source
    was when the backup started).
(2) But you will not see that in the failing output, because the job
    offset is reported as being greater than the job length.  This is
    because one cluster is copied twice, and thus accounted for twice,
    but of course the job length does not increase.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20190801173900.23851-3-mreitz@redhat.com
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agobackup: Copy only dirty areas
Max Reitz [Thu, 1 Aug 2019 17:38:59 +0000 (19:38 +0200)] 
backup: Copy only dirty areas

The backup job must only copy areas that the copy_bitmap reports as
dirty.  This is always the case when using traditional non-offloading
backup, because it copies each cluster separately.  When offloading the
copy operation, we sometimes copy more than one cluster at a time, but
we only check whether the first one is dirty.

Therefore, whenever copy offloading is possible, the backup job
currently produces wrong output when the guest writes to an area of
which an inner part has already been backed up, because that inner part
will be re-copied.

Fixes: 9ded4a0114968e98b41494fc035ba14f84cdf700
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190801173900.23851-2-mreitz@redhat.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
17 months agoMerge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20190803' into...
Peter Maydell [Mon, 5 Aug 2019 10:05:36 +0000 (11:05 +0100)] 
Merge remote-tracking branch 'remotes/philmd-gitlab/tags/edk2-next-20190803' into staging

A harmless build-sys patch that fixes a regression affecting Linux
distributions packaging QEMU.

# gpg: Signature made Sat 03 Aug 2019 09:24:15 BST
# gpg:                using RSA key E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* remotes/philmd-gitlab/tags/edk2-next-20190803:
  Makefile: remove DESTDIR from firmware file content

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoMakefile: remove DESTDIR from firmware file content
Olaf Hering [Thu, 30 May 2019 19:28:11 +0000 (21:28 +0200)] 
Makefile: remove DESTDIR from firmware file content

The resulting firmware files should only contain the runtime path.
Fixes commit 26ce90fde5c ("Makefile: install the edk2 firmware images
and their descriptors")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20190530192812.17637-1-olaf@aepfle.de>
Fixes: https://bugs.launchpad.net/qemu/+bug/1838703
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
17 months agotarget/arm: Avoid bogus NSACR traps on M-profile without Security Extension
Peter Maydell [Thu, 1 Aug 2019 10:57:42 +0000 (11:57 +0100)] 
target/arm: Avoid bogus NSACR traps on M-profile without Security Extension

In Arm v8.0 M-profile CPUs without the Security Extension and also in
v7M CPUs, there is no NSACR register. However, the code we have to handle
the FPU does not always check whether the ARM_FEATURE_M_SECURITY bit
is set before testing whether env->v7m.nsacr permits access to the
FPU. This means that for a CPU with an FPU but without the Security
Extension we would always take a bogus fault when trying to stack
the FPU registers on an exception entry.

We could fix this by adding extra feature bit checks for all uses,
but it is simpler to just make the internal value of nsacr 0xcff
("all non-secure accesses allowed"), since this is not guest
visible when the Security Extension is not present. This allows
us to continue to follow the Arm ARM pseudocode which takes a
similar approach. (In particular, in the v8.1 Arm ARM the register
is documented as reading as 0xcff in this configuration.)

Fixes: https://bugs.launchpad.net/qemu/+bug/1838475
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Damien Hedde <damien.hedde@greensocs.com>
Message-id: 20190801105742.20036-1-peter.maydell@linaro.org

17 months agoMerge remote-tracking branch 'remotes/elmarco/tags/slirp-CVE-2019-14378-pull-request...
Peter Maydell [Fri, 2 Aug 2019 12:06:03 +0000 (13:06 +0100)] 
Merge remote-tracking branch 'remotes/elmarco/tags/slirp-CVE-2019-14378-pull-request' into staging

Slirp CVE-2019-14378 pull request

# gpg: Signature made Fri 02 Aug 2019 12:17:24 BST
# gpg:                using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5
# gpg:                issuer "marcandre.lureau@redhat.com"
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full]
# gpg:                 aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full]
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276  F62D DAE8 E109 7596 9CE5

* remotes/elmarco/tags/slirp-CVE-2019-14378-pull-request:
  slirp: update with CVE-2019-14378 fix

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoslirp: update with CVE-2019-14378 fix
Marc-André Lureau [Fri, 2 Aug 2019 11:14:56 +0000 (15:14 +0400)] 
slirp: update with CVE-2019-14378 fix

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
17 months agoUpdate version for v4.1.0-rc3 release v4.1.0-rc3
Peter Maydell [Tue, 30 Jul 2019 21:02:05 +0000 (22:02 +0100)] 
Update version for v4.1.0-rc3 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agoMerge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
Peter Maydell [Tue, 30 Jul 2019 19:53:26 +0000 (20:53 +0100)] 
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pci: bugfix

A last minute fix to cross-version migration.
Better late than never.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Tue 30 Jul 2019 17:07:42 BST
# gpg:                using RSA key 281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  pcie_root_port: Disable ACS on older machines
  pcie_root_port: Allow ACS to be disabled

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17 months agopcie_root_port: Disable ACS on older machines
Dr. David Alan Gilbert [Tue, 30 Jul 2019 09:37:19 +0000 (10:37 +0100)] 
pcie_root_port: Disable ACS on older machines

ACS got added in 4.0 unconditionally,  that broke older<->4.0 migration
where there was a PCIe root port.
Fix this by turning it off for 3.1 and older machines; note this
fixes compatibility for older QEMUs but breaks compatibility with 4.0
for older machine types.

    machine type    source qemu   dest qemu
       3.1             3.1           4.0        broken
       3.1             3.1           4.1rc2     broken
       3.1             3.1           4.1+this   OK ++
       3.1             4.0           4.1rc2     OK
       3.1             4.0           4.1+this   broken --
       4.0             4.0           4.1rc2     OK
       4.0             4.0           4.1+this   OK

So we gain and lose; the consensus seems to be treat this as a
fix for older machine types.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190730093719.12958-3-dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
17 months agopcie_root_port: Allow ACS to be disabled
Dr. David Alan Gilbert [Tue, 30 Jul 2019 09:37:18 +0000 (10:37 +0100)] 
pcie_root_port: Allow ACS to be disabled

ACS was added in 4.0 unconditionally, this breaks migration
compatibility.
Allow ACS to be disabled by adding a property that's
checked by pcie_root_port.

Unfortunately pcie-root-port doesn't have any instance data,
so there's no where for that flag to live, so stuff it into
PCIESlot.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190730093719.12958-2-dgilbert@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>