qemu.git
2 years agochar-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
Alberto Garcia [Mon, 12 Aug 2019 15:58:29 +0000 (18:58 +0300)] 
char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()

There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
 #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
 #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
 #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
 #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
 #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
 #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
 #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x00005558b819140d in main_loop () at vl.c:1924
 #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

This patch adds a lock to protect tcp_chr_disconnect() and
socket_reconnect_timeout()

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1565625509-404969-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomain-loop: Fix GSource leak in qio_task_thread_worker()
Alberto Garcia [Mon, 12 Aug 2019 15:58:28 +0000 (18:58 +0300)] 
main-loop: Fix GSource leak in qio_task_thread_worker()

After g_source_attach() the GMainContext holds a reference to the
GSource, so the caller does not need to keep it.

qio_task_thread_worker() is not releasing its reference so the GSource
is being leaked since a17536c594bfed94d05667b419f747b692f5fc7f.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1565625509-404969-2-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomemory: Fix up memory_region_{add|del}_coalescing
Peter Xu [Tue, 20 Aug 2019 14:13:28 +0000 (22:13 +0800)] 
memory: Fix up memory_region_{add|del}_coalescing

The old memory_region_{add|clear}_coalescing() has some defects
because they both changed mr->coalesced before updating the regions
using memory_region_update_coalesced_range_as().  Then when the
regions were updated in memory_region_update_coalesced_range_as() the
mr->coalesced will always be either one more or one less.  So:

- For memory_region_add_coalescing: it'll always trying to remove the
  newly added coalesced region while it shouldn't, and,

- For memory_region_clear_coalescing: when it calls the update there
  will be no coalesced ranges on mr->coalesced because they were all
  removed before hand so the update will probably do nothing for real.

Let's fix this.  Now we've got flat_range_coalesced_io_notify() to
notify a single CoalescedMemoryRange instance change, so use it in the
existing memory_region_update_coalesced_range() logic by only notify
either an addition or deletion.  Then we hammer both the
memory_region_{add|clear}_coalescing() to use it.

Fixes: 3ac7d43a6fbb5d4a3
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190820141328.10009-5-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomemory: Remove has_coalesced_range counter
Peter Xu [Tue, 20 Aug 2019 14:13:26 +0000 (22:13 +0800)] 
memory: Remove has_coalesced_range counter

The has_coalesced_range could potentially be problematic in that it
only works for additions of coalesced mmio ranges but not deletions.
The reason is that has_coalesced_range information can be lost when
the FlatView updates the topology again when the updated region is not
covering the coalesced regions. When that happens, due to
flatrange_equal() is not checking against has_coalesced_range, the new
FlatRange will be seen as the same one as the old and the new
instance (whose has_coalesced_range will be zero) will replace the old
instance (whose has_coalesced_range _could_ be non-zero).

The counter was originally used to make sure every FlatRange will only
notify once for coalesced_io_{add|del} memory listeners, because each
FlatRange can be used by multiple address spaces, so logically
speaking it could be called multiple times.  However we should not
limit that, because memory listeners should will only be registered
with specific address space rather than multiple address spaces.

So let's fix this up by simply removing the whole has_coalesced_range.

Fixes: 3ac7d43a6fbb5d4a3
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190820141328.10009-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomemory: Split zones when do coalesced_io_del()
Peter Xu [Tue, 20 Aug 2019 14:13:25 +0000 (22:13 +0800)] 
memory: Split zones when do coalesced_io_del()

It is a workaround of current KVM's KVM_UNREGISTER_COALESCED_MMIO
interface.  The kernel interface only allows to unregister an mmio
device with exactly the zone size when registered, or any smaller zone
that is included in the device mmio zone.  It does not support the
userspace to specify a very large zone to remove all the small mmio
devices within the zone covered.

Logically speaking it would be nicer to fix this from KVM side, though
in all cases we still need to coop with old kernels so let's do this.

Fixes: 3ac7d43a6fbb5d4a3
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190820141328.10009-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomemory: Refactor memory_region_clear_coalescing
Peter Xu [Tue, 20 Aug 2019 14:13:27 +0000 (22:13 +0800)] 
memory: Refactor memory_region_clear_coalescing

Removing the update variable and quit earlier if the memory region has
no coalesced range.  This prepares for the next patch.

Fixes: 3ac7d43a6fbb5d4a3
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190820141328.10009-4-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agominikconf: don't print CONFIG_FOO=n lines
Marc-André Lureau [Thu, 25 Jul 2019 19:36:15 +0000 (23:36 +0400)] 
minikconf: don't print CONFIG_FOO=n lines

qemu in general doesn't define CONFIG_FOO if it's false.  This also
helps with the dumb kconfig parser from meson, as source_set considers
any non-empty value as true.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoconfigure: remove AUTOCONF_HOST
Marc-André Lureau [Tue, 23 Jul 2019 12:14:49 +0000 (16:14 +0400)] 
configure: remove AUTOCONF_HOST

This is a left-over from commit
c12b6d70e384c769ca372e15ffd19b3e9f563662 ("pixman: drop submodule")

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotests: add module loading test
Marc-André Lureau [Mon, 22 Jul 2019 18:51:40 +0000 (22:51 +0400)] 
tests: add module loading test

This test will simply check that modules can be loaded, and no symbols
are missing.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomodule: return success on module load
Marc-André Lureau [Mon, 22 Jul 2019 13:13:23 +0000 (17:13 +0400)] 
module: return success on module load

Let the caller know of load success.

Note that this also changes slightly the behaviour of the function to
try loading on subsequent calls if the previous ones failed.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomodule: use g_hash_table_add()
Marc-André Lureau [Mon, 22 Jul 2019 13:10:46 +0000 (17:10 +0400)] 
module: use g_hash_table_add()

The hashtable is used like a set, use the convenience
g_hash_table_add() function.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoconfigure: define CONFIG_TOOLS here
Paolo Bonzini [Thu, 18 Jul 2019 10:24:29 +0000 (12:24 +0200)] 
configure: define CONFIG_TOOLS here

Defining CONFIG_TOOLS on the basis of $(TOOLS) has the disadvantage
of including it also if e.g. qemu-ga is requested.  The correct
information is available in configure, define it there.

This also has the benefit of not installing the manpages for block layer
tools if the only "tool" being built is the guest agent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoqemu-ga: clean up TOOLS variable
Paolo Bonzini [Thu, 18 Jul 2019 10:22:01 +0000 (12:22 +0200)] 
qemu-ga: clean up TOOLS variable

qemu-ga is included in the TOOLS variable without the .exe suffix, and this is
then worked around twice in the Makefile.  Do the right thing in configure
instead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20190821' into staging
Peter Maydell [Wed, 21 Aug 2019 13:04:16 +0000 (14:04 +0100)] 
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20190821' into staging

ppc patch queue for 2019-08-21

First ppc and spapr pull request for qemu-4.2.  Includes:
   * Some TCG emulation fixes and performance improvements
   * Support for the mffsl instruction in TCG
   * Added missing DPDES SPR
   * Some enhancements to the emulation of the XIVE interrupt
     controller
   * Cleanups to spapr MSI management
   * Some new suspend/resume infrastructure and a draft suspend
     implementation for spapr
   * New spapr hypercall for TPM communication (will be needed for
     secure guests under an Ultravisor)
   * Fix several memory leaks

And a few other assorted fixes.

# gpg: Signature made Wed 21 Aug 2019 08:24:44 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.2-20190821: (42 commits)
  ppc: Fix emulated single to double denormalized conversions
  ppc: Fix emulated INFINITY and NAN conversions
  ppc: conform to processor User's Manual for xscvdpspn
  ppc: Add support for 'mffsl' instruction
  target/ppc: Add Directed Privileged Door-bell Exception State (DPDES) SPR
  spapr/xive: Mask the EAS when allocating an IRQ
  spapr: Implement better workaround in spapr-vty device
  spapr/irq: Drop spapr_irq_msi_reset()
  spapr/pci: Free MSIs during reset
  spapr/pci: Consolidate de-allocation of MSIs
  ppc: remove idle_timer logic
  spapr: Implement ibm,suspend-me
  i386: use machine class ->wakeup method
  machine: Add wakeup method to MachineClass
  ppc/xive: Improve 'info pic' support
  ppc/xive: Provide silent escalation support
  ppc/xive: Provide unconditional escalation support
  ppc/xive: Provide escalation support
  ppc/xive: Provide backlog support
  ppc/xive: Implement TM_PULL_OS_CTX special command
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2 years agoMerge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Peter Maydell [Wed, 21 Aug 2019 08:00:49 +0000 (09:00 +0100)] 
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* New KVM PV features (Marcelo, Wanpeng)
* valgrind fixes (Andrey)
* Remove clock reset notifiers (David)
* KConfig and Makefile cleanups (Paolo)
* Replay and icount improvements (Pavel)
* x86 FP fixes (Peter M.)
* TCG locking assertions (Roman)
* x86 support for mmap-ed -kernel/-initrd (Stefano)
* Other cleanups (Wei Yang, Yan Zhao, Tony)
* LSI fix for infinite loop (Prasad)
* ARM migration fix (Catherine)
* AVX512_BF16 feature (Jing)

# gpg: Signature made Tue 20 Aug 2019 19:00:54 BST
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (33 commits)
  x86: Intel AVX512_BF16 feature enabling
  scsi: lsi: exit infinite loop while executing script (CVE-2019-12068)
  test-bitmap: test set 1 bit case for bitmap_set
  migration: do not rom_reset() during incoming migration
  HACKING: Document 'struct' keyword usage
  kvm: vmxcap: Enhance with latest features
  cpus-common: nuke finish_safe_work
  icount: remove unnecessary gen_io_end calls
  icount: clean up cpu_can_io at the entry to the block
  replay: rename step-related variables and functions
  replay: refine replay-time module
  replay: fix replay shutdown
  util/qemu-timer: refactor deadline calculation for external timers
  replay: document development rules
  replay: add missing fix for internal function
  timer: last, remove last bits of last
  replay: Remove host_clock_last
  timer: Remove reset notifiers
  mc146818rtc: Remove reset notifiers
  memory: fix race between TCG and accesses to dirty bitmap
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2 years agoppc: Fix emulated single to double denormalized conversions
Paul A. Clarke [Mon, 19 Aug 2019 21:42:16 +0000 (16:42 -0500)] 
ppc: Fix emulated single to double denormalized conversions

helper_todouble() was not properly converting any denormalized 32 bit
float to 64 bit double.

Fix-suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
v2:
- Splitting patch "ppc: Three floating point fixes"; this is just one part.
- Original suggested "fix" was likely flawed.  v2 is rewritten by
  Richard Henderson (Thanks, Richard!); I reformatted the comments in a
  couple of places, compiled, and tested.
Message-Id: <1566250936-14538-1-git-send-email-pc@us.ibm.com>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: Fix emulated INFINITY and NAN conversions
Paul A. Clarke [Mon, 19 Aug 2019 19:19:48 +0000 (14:19 -0500)] 
ppc: Fix emulated INFINITY and NAN conversions

helper_todouble() was not properly converting INFINITY from 32 bit
float to 64 bit double.

(Normalized operand conversion is unchanged, other than indentation.)

Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Message-Id: <1566242388-9244-1-git-send-email-pc@us.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: conform to processor User's Manual for xscvdpspn
Paul A. Clarke [Mon, 19 Aug 2019 17:43:21 +0000 (12:43 -0500)] 
ppc: conform to processor User's Manual for xscvdpspn

The POWER8 and POWER9 User's Manuals specify the implementation
behavior for what the ISA leaves "undefined" behavior for the
xscvdpspn and xscvdpsp instructions.  This patch corrects the QEMU
implementation to match the hardware implementation for that case.

ISA 3.0B has xscvdpspn leaving its result in word 0 of the target register,
with the other words of the target register left "undefined".

The User's Manuals specify:
  VSX scalar convert from double-precision to single-precision (xscvdpsp,
  xscvdpspn).
  VSR[32:63] is set to VSR[0:31].
So, words 0 and 1 both contain the result.

Note: this is important because GCC as of version 8 or so, assumes and takes
advantage of this behavior to optimize the following sequence:
  xscvdpspn vs0,vs1
  mffprwz   r8,f0
ISA 3.0B has xscvdpspn leaving its result in word 0 of the target register,
and mffprwz expecting its input to come from word 1 of the source register.
This sequence fails with QEMU, as a shift is required between those two
instructions.  However, since the hardware splats the result to both words 0
and 1 of its output register, the shift is not necessary.

Expect a future revision of the ISA to specify this behavior.

Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
v2
- Splitting patch "ppc: Three floating point fixes"; this is just one part.
- Updated commit message to clarify behavior is documented in User's Manuals.
- Updated commit message to correct which words are in output and source of
  xscvdpspn and mffprz.
- No source changes to this part of the original patch.

Message-Id: <1566236601-22954-1-git-send-email-pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: Add support for 'mffsl' instruction
Paul A. Clarke [Fri, 16 Aug 2019 19:03:23 +0000 (14:03 -0500)] 
ppc: Add support for 'mffsl' instruction

ISA 3.0B added a set of Floating-Point Status and Control Register (FPSCR)
instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl.
This patch adds support for 'mffsl'.

'mffsl' is identical to 'mffs', except it only returns mode, status, and enable
bits from the FPSCR.

On CPUs without support for 'mffsl' (below ISA 3.0), the 'mffsl' instruction
will execute identically to 'mffs'.

Note: I renamed FPSCR_RN to FPSCR_RN0 so I could create an FPSCR_RN mask which
is both bits of the FPSCR rounding mode, as defined in the ISA.

I also fixed a typo in the definition of FPSCR_FR.

Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
v4:
- nit: added some braces to resolve a checkpatch complaint.

v3:
- Changed tcg_gen_and_i64 to tcg_gen_andi_i64, eliminating the need for a
  temporary, per review from Richard Henderson.

v2:
- I found that I copied too much of the 'mffs' implementation.
  The 'Rc' condition code bits are not needed for 'mffsl'.  Removed.
- I now free the (renamed) 'tmask' temporary.
- I now bail early for older ISA to the original 'mffs' implementation.

Message-Id: <1565982203-11048-1-git-send-email-pc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: Add Directed Privileged Door-bell Exception State (DPDES) SPR
Alexey Kardashevskiy [Fri, 16 Aug 2019 06:17:33 +0000 (16:17 +1000)] 
target/ppc: Add Directed Privileged Door-bell Exception State (DPDES) SPR

DPDES stores a status of a doorbell message and if it is lost in
migration, the destination CPU won't receive it. This does not hit us
much as IPIs complete too quick to catch a pending one and even if
we missed one, broadcasts happen often enough to wake that CPU.

This defines DPDES and registers with KVM for migration.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190816061733.53572-1-aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr/xive: Mask the EAS when allocating an IRQ
Cédric Le Goater [Tue, 13 Aug 2019 16:44:20 +0000 (18:44 +0200)] 
spapr/xive: Mask the EAS when allocating an IRQ

If an IRQ is allocated and not configured, such as a MSI requested by
a PCI driver, it can be saved in its default state and possibly later
on restored using the same state. If not initially MASKED, KVM will
try to find a matching priority/target tuple for the interrupt and
fail to restore the VM because 0/0 is not a valid target.

When allocating a IRQ number, the EAS should be set to a sane default :
VALID and MASKED.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190813164420.9829-1-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: Implement better workaround in spapr-vty device
Paul Mackerras [Wed, 31 Jul 2019 04:36:54 +0000 (14:36 +1000)] 
spapr: Implement better workaround in spapr-vty device

Linux guest kernels have code which scans the string of characters
returned from the H_GET_TERM_CHAR hypercall and removes any \0
character which comes immediately after a \r character.  This is to
work around a bug which was present in some ancient versions of
PowerVM.  In order to avoid the corruption of the console byte stream
that this introduced, commit 6c3bc244d3cb ("spapr: Implement bug in
spapr-vty device to be compatible with PowerVM") added a workaround
which adds a \0 character after every \r character.  Unfortunately,
this corrupts the console byte stream for those operating systems,
such as AIX, which don't remove the null bytes.

We can avoid triggering the Linux kernel workaround if we avoid
returning a buffer which contains a \0 after a \r.  We can do that by
breaking out of the loop in vty_getchars() if we are about to insert a
\0 and the previous character in the buffer is a \r.  That means we
return the characters up to the \r for the current H_GET_TERM_CHAR,
and the characters starting with the \0 for the next one.

With this workaround, we don't insert any spurious characters and we
avoid triggering the Linux kernel workaround, so the guest will
receive an uncorrupted stream whether or not they have the workaround.

Fixes: 6c3bc244d3cb ("spapr: Implement bug in spapr-vty device to be compatible with PowerVM")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Message-Id: <20190731043653.shdi5sizjp4t65op@oak.ozlabs.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr/irq: Drop spapr_irq_msi_reset()
Greg Kurz [Fri, 26 Jul 2019 14:44:49 +0000 (16:44 +0200)] 
spapr/irq: Drop spapr_irq_msi_reset()

PHBs already take care of clearing the MSIs from the bitmap during reset
or unplug. No need to do this globally from the machine code. Rather add
an assert to ensure that PHBs have acted as expected.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156415228966.1064338.190189424190233355.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[dwg: Fix crash in qtest case where spapr->irq_map can be NULL at the
 new assert()]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr/pci: Free MSIs during reset
Greg Kurz [Fri, 26 Jul 2019 14:44:44 +0000 (16:44 +0200)] 
spapr/pci: Free MSIs during reset

When the machine is reset, the MSI bitmap is cleared but the allocated
MSIs are not freed. Some operating systems, such as AIX, can detect the
previous configuration and assert.

Empty the MSI cache, this performs the needed cleanup.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156415228410.1064338.4486161194061636096.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr/pci: Consolidate de-allocation of MSIs
Greg Kurz [Fri, 26 Jul 2019 14:44:38 +0000 (16:44 +0200)] 
spapr/pci: Consolidate de-allocation of MSIs

When freeing MSIs, we need to:
- remove them from the machine's MSI bitmap
- remove them from the IC backend
- remove them from the PHB's MSI cache

This is currently open coded in two places in rtas_ibm_change_msi(),
and we're about to need this in spapr_phb_reset() as well. Instead of
duplicating this code again, make it a destroy function for the PHB's
MSI cache. Removing an MSI device from the cache will call the destroy
function internally.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <156415227855.1064338.5657793835271464648.stgit@bahia.lan>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: remove idle_timer logic
Shivaprasad G Bhat [Thu, 25 Jul 2019 14:15:08 +0000 (09:15 -0500)] 
ppc: remove idle_timer logic

The logic is broken for multiple vcpu guests, also causing memory leak.
The logic is in place to handle kvm not having KVM_CAP_PPC_IRQ_LEVEL,
which is part of the kernel now since 2.6.37. Instead of fixing the
leak, drop the redundant logic which is not excercised on new kernels
anymore. Exit with error on older kernels.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <156406409479.19996.7606556689856621111.stgit@lep8c.aus.stglabs.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: Implement ibm,suspend-me
Nicholas Piggin [Mon, 22 Jul 2019 06:17:52 +0000 (16:17 +1000)] 
spapr: Implement ibm,suspend-me

This has been useful to modify and test the Linux pseries suspend
code but it requires modification to the guest to call it (due to
being gated by other unimplemented features). It is not otherwise
used by Linux yet, but work is slowly progressing there.

This allows a (lightly modified) guest kernel to suspend with
`echo mem > /sys/power/state` and be resumed with system_wakeup
monitor command.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190722061752.22114-2-npiggin@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoi386: use machine class ->wakeup method
Nicholas Piggin [Mon, 22 Jul 2019 06:17:51 +0000 (16:17 +1000)] 
i386: use machine class ->wakeup method

Move the i386 suspend_wakeup logic out of the fallback path, and into
the new ->wakeup method.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190722061752.22114-1-npiggin@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agomachine: Add wakeup method to MachineClass
Nicholas Piggin [Mon, 22 Jul 2019 05:32:13 +0000 (15:32 +1000)] 
machine: Add wakeup method to MachineClass

Waking from suspend is not logically a machine reset on all machines,
particularly in the paravirtualized case rather than hardware
emulated. The ppc spapr machine for example just invokes hypervisor
to suspend, and expects that call to return with the machine in the
same state (modulo some possible migration and reconfiguration
details).

Implement a machine ->wakeup method and use that if it exists.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190722053215.20808-2-npiggin@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: Improve 'info pic' support
Cédric Le Goater [Thu, 18 Jul 2019 11:54:11 +0000 (13:54 +0200)] 
ppc/xive: Improve 'info pic' support

Provide a better output of the XIVE END structures including the
escalation information and extend the PowerNV machine 'info pic'
command with a dump of the END EAS table used for escalations.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-9-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: Provide silent escalation support
Cédric Le Goater [Thu, 18 Jul 2019 11:54:10 +0000 (13:54 +0200)] 
ppc/xive: Provide silent escalation support

When the 's' bit is set the escalation is said to be 'silent' or
'silent/gather'. In such configuration, the notification sequence is
skipped and only the escalation sequence is performed. This is used to
configure all the EQs of a vCPU to escalate on a single EQ which will
then target the hypervisor.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-8-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: Provide unconditional escalation support
Cédric Le Goater [Thu, 18 Jul 2019 11:54:09 +0000 (13:54 +0200)] 
ppc/xive: Provide unconditional escalation support

When the 'u' bit is set the escalation is said to be 'unconditional'
which means that the ESe PQ bits are not used. Introduce a
xive_router_end_es_notify() routine to share code with the ESn
notification.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-7-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: Provide escalation support
Cédric Le Goater [Thu, 18 Jul 2019 11:54:08 +0000 (13:54 +0200)] 
ppc/xive: Provide escalation support

If the XIVE presenter can not find the NVT dispatched on any of the HW
threads, it can not deliver the interrupt. XIVE offers an escalation
mechanism to handle such scenarios and inform the hypervisor that an
action should be taken.

Escalation is configured by setting the 'e' bit and the EAS in word 4
& 5 to let the HW look for the escalation END on which to trigger a
new event.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-6-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: Provide backlog support
Cédric Le Goater [Thu, 18 Jul 2019 11:54:07 +0000 (13:54 +0200)] 
ppc/xive: Provide backlog support

If backlog is activated ('b' bit) on the END, the pending priority of
a missed event is recorded in the IPB field of the NVT for a later
resend.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-5-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: Implement TM_PULL_OS_CTX special command
Cédric Le Goater [Thu, 18 Jul 2019 11:54:06 +0000 (13:54 +0200)] 
ppc/xive: Implement TM_PULL_OS_CTX special command

When a vCPU is not dispatched anymore on a HW thread, the Hypervisor
(KVM on Linux) invalidates the OS interrupt context of a vCPU with
this special command. It returns the OS CAM line value and resets the
VO bit.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-4-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc/xive: use an abstract type for XiveNotifier
Cédric Le Goater [Thu, 18 Jul 2019 11:54:04 +0000 (13:54 +0200)] 
ppc/xive: use an abstract type for XiveNotifier

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190718115420.19919-2-clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agopseries: Update SLOF firmware image
Alexey Kardashevskiy [Fri, 19 Jul 2019 01:47:59 +0000 (11:47 +1000)] 
pseries: Update SLOF firmware image

The only change that SLOF does not rely on QEMU providing an RTAS blob
and provides one itself:
https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=5e4ed1fd0f39e

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy
Michael Roth [Wed, 17 Jul 2019 20:58:42 +0000 (15:58 -0500)] 
spapr: initial implementation for H_TPM_COMM/spapr-tpm-proxy

This implements the H_TPM_COMM hypercall, which is used by an
Ultravisor to pass TPM commands directly to the host's TPM device, or
a TPM Resource Manager associated with the device.

This also introduces a new virtual device, spapr-tpm-proxy, which
is used to configure the host TPM path to be used to service
requests sent by H_TPM_COMM hcalls, for example:

  -device spapr-tpm-proxy,id=tpmp0,host-path=/dev/tpmrm0

By default, no spapr-tpm-proxy will be created, and hcalls will return
H_FUNCTION.

The full specification for this hypercall can be found in
docs/specs/ppc-spapr-uv-hcalls.txt

Since SVM-related hcalls like H_TPM_COMM use a reserved range of
0xEF00-0xEF80, we introduce a separate hcall table here to handle
them.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com
Message-Id: <20190717205842.17827-3-mdroth@linux.vnet.ibm.com>
[dwg: Corrected #include for upstream change]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agodocs/specs: initial spec summary for Ultravisor-related hcalls
Michael Roth [Wed, 17 Jul 2019 20:58:41 +0000 (15:58 -0500)] 
docs/specs: initial spec summary for Ultravisor-related hcalls

For now this only covers hcalls relating to TPM communication since
it's the only one particularly important from a QEMU perspective atm,
but others can be added here where it makes sense.

The full specification for all hcalls/ucalls will eventually be made
available in the public/OpenPower version of the PAPR specification.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20190717205842.17827-2-mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: Implement H_JOIN
Nicholas Piggin [Thu, 18 Jul 2019 03:42:14 +0000 (13:42 +1000)] 
spapr: Implement H_JOIN

This has been useful to modify and test the Linux pseries suspend
code but it requires modification to the guest to call it (due to
being gated by other unimplemented features). It is not otherwise
used by Linux yet, but work is slowly progressing there.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-5-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: Implement H_CONFER
Nicholas Piggin [Thu, 18 Jul 2019 03:42:13 +0000 (13:42 +1000)] 
spapr: Implement H_CONFER

This does not do directed yielding and is not quite as strict as PAPR
specifies in terms of precise dispatch behaviour. This generally will
mean suboptimal performance, rather than guest misbehaviour. Linux
does not rely on exact dispatch behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-4-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: Implement H_PROD
Nicholas Piggin [Thu, 18 Jul 2019 03:42:12 +0000 (13:42 +1000)] 
spapr: Implement H_PROD

H_PROD is added, and H_CEDE is modified to test the prod bit
according to PAPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-3-npiggin@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: Implement dispatch tracking for tcg
Nicholas Piggin [Thu, 18 Jul 2019 03:42:11 +0000 (13:42 +1000)] 
spapr: Implement dispatch tracking for tcg

Implement cpu_exec_enter/exit on ppc which calls into new methods of
the same name in PPCVirtualHypervisorClass. These are used by spapr
to implement the splpar VPA dispatch counter initially.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20190718034214.14948-2-npiggin@gmail.com>
[dwg: Removed unnecessary CONFIG_USER_ONLY checks as suggested by gkurz]
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: fix leak in h_client_architecture_support
Shivaprasad G Bhat [Wed, 17 Jul 2019 08:20:31 +0000 (03:20 -0500)] 
ppc: fix leak in h_client_architecture_support

Free all SpaprOptionVector local pointers after use.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <156335160761.82682.11912058325777251614.stgit@lep8c.aus.stglabs.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: fix memory leak in spapr_dt_drc()
Shivaprasad G Bhat [Wed, 17 Jul 2019 08:20:01 +0000 (03:20 -0500)] 
ppc: fix memory leak in spapr_dt_drc()

Leaking the drc_name while preparing the DT properties.
Fixing that.

Also, remove the const qualifier from spapr_drc_name().

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <156335159028.82682.5404622104535818162.stgit@lep8c.aus.stglabs.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agoppc: fix memory leak in spapr_caps_add_properties
Shivaprasad G Bhat [Wed, 17 Jul 2019 08:19:43 +0000 (03:19 -0500)] 
ppc: fix memory leak in spapr_caps_add_properties

Free the capability name string after setting
the capability.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <156335156198.82682.8756968724044750843.stgit@lep8c.aus.stglabs.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: Optimize emulation of vclzw instruction
Stefan Brankovic [Mon, 15 Jul 2019 14:22:52 +0000 (16:22 +0200)] 
target/ppc: Optimize emulation of vclzw instruction

Optimize Altivec instruction vclzw (Vector Count Leading Zeros Word).
This instruction counts the number of leading zeros of each word element
in source register and places result in the appropriate word element of
destination register.

Counting is to be performed in four iterations of for loop(one for each
word elemnt of source register vB). Every iteration consists of loading
appropriate word element from source register, counting leading zeros
with tcg_gen_clzi_i32, and saving the result in appropriate word element
of destination register.

Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-7-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: Optimize emulation of vclzd instruction
Stefan Brankovic [Mon, 15 Jul 2019 14:22:51 +0000 (16:22 +0200)] 
target/ppc: Optimize emulation of vclzd instruction

Optimize Altivec instruction vclzd (Vector Count Leading Zeros Doubleword).
This instruction counts the number of leading zeros of each doubleword element
in source register and places result in the appropriate doubleword element of
destination register.

Using tcg-s count leading zeros instruction two times(once for each
doubleword element of source register vB) and placing result in
appropriate doubleword element of destination register vD.

Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-6-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: Optimize emulation of vgbbd instruction
Stefan Brankovic [Mon, 15 Jul 2019 14:22:50 +0000 (16:22 +0200)] 
target/ppc: Optimize emulation of vgbbd instruction

Optimize altivec instruction vgbbd (Vector Gather Bits by Bytes by Doubleword)
All ith bits (i in range 1 to 8) of each byte of doubleword element in
source register are concatenated and placed into ith byte of appropriate
doubleword element in destination register.

Following solution is done for both doubleword elements of source register
in parallel, in order to reduce the number of instructions needed(that's why
arrays are used):
First, both doubleword elements of source register vB are placed in
appropriate element of array avr. Bits are gathered in 2x8 iterations(2 for
loops). In first iteration bit 1 of byte 1, bit 2 of byte 2,... bit 8 of
byte 8 are in their final spots so avr[i], i={0,1} can be and-ed with
tcg_mask. For every following iteration, both avr[i] and tcg_mask variables
have to be shifted right for 7 and 8 places, respectively, in order to get
bit 1 of byte 2, bit 2 of byte 3.. bit 7 of byte 8 in their final spots so
shifted avr values(saved in tmp) can be and-ed with new value of tcg_mask...
After first 8 iteration(first loop), all the first bits are in their final
places, all second bits but second bit from eight byte are in their places...
only 1 eight bit from eight byte is in it's place). In second loop we do all
operations symmetrically, in order to get other half of bits in their final
spots. Results for first and second doubleword elements are saved in
result[0] and result[1] respectively. In the end those results are saved in
appropriate doubleword element of destination register vD.

Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-5-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: move opcode decode tables to PowerPCCPU
Alex Bennée [Tue, 16 Jul 2019 12:13:52 +0000 (13:13 +0100)] 
target/ppc: move opcode decode tables to PowerPCCPU

The opcode decode tables aren't really part of the CPUPPCState but an
internal implementation detail for the translator. This can cause
problems with memcpy in cpu_copy as any table created during
ppc_cpu_realize get written over causing a memory leak. To avoid this
move the tables into PowerPCCPU which is better suited to hold
internal implementation details.

Attempts to fix: https://bugs.launchpad.net/qemu/+bug/1836558
Cc: 1836558@bugs.launchpad.net
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20190716121352.302-1-alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: Optimize emulation of vsl and vsr instructions
Stefan Brankovic [Mon, 15 Jul 2019 14:22:48 +0000 (16:22 +0200)] 
target/ppc: Optimize emulation of vsl and vsr instructions

Optimization of altivec instructions vsl and vsr(Vector Shift Left/Rigt).
Perform shift operation (left and right respectively) on 128 bit value of
register vA by value specified in bits 125-127 of register vB. Lowest 3
bits in each byte element of register vB must be identical or result is
undefined.

For vsl instruction, the first step is bits 125-127 of register vB have
to be saved in variable sh. Then, the highest sh bits of the lower
doubleword element of register vA are saved in variable shifted,
in order not to lose those bits when shift operation is performed on
the lower doubleword element of register vA, which is the next
step. After shifting the lower doubleword element shift operation
is performed on higher doubleword element of vA, with replacement of
the lowest sh bits(that are now 0) with bits saved in shifted.

For vsr instruction, firstly, the bits 125-127 of register vB have
to be saved in variable sh. Then, the lowest sh bits of the higher
doubleword element of register vA are saved in variable shifted,
in odred not to lose those bits when the shift operation is
performed on the higher doubleword element of register vA, which is
the next step. After shifting higher doubleword element, shift operation
is performed on lower doubleword element of vA, with replacement of
highest sh bits(that are now 0) with bits saved in shifted.

Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-3-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agotarget/ppc: Optimize emulation of lvsl and lvsr instructions
Stefan Brankovic [Mon, 15 Jul 2019 14:22:47 +0000 (16:22 +0200)] 
target/ppc: Optimize emulation of lvsl and lvsr instructions

Adding simple macro that is calling tcg implementation of appropriate
instruction if altivec support is active.

Optimization of altivec instruction lvsl (Load Vector for Shift Left).
Place bytes sh:sh+15 of value 0x00 || 0x01 || 0x02 || ... || 0x1E || 0x1F
in destination register. Sh is calculated by adding 2 source registers and
getting bits 60-63 of result.

First, the bits [28-31] are placed from EA to variable sh. After that,
the bytes are created in the following way:
sh:(sh+7) of X(from description) by multiplying sh with 0x0101010101010101
followed by addition of the result with 0x0001020304050607. Value obtained
is placed in higher doubleword element of vD.
(sh+8):(sh+15) by adding the result of previous multiplication with
0x08090a0b0c0d0e0f. Value obtained is placed in lower doubleword element
of vD.

Optimization of altivec instruction lvsr (Load Vector for Shift Right).
Place bytes 16-sh:31-sh of value 0x00 || 0x01 || 0x02 || ... || 0x1E ||
0x1F in destination register. Sh is calculated by adding 2 source
registers and getting bits 60-63 of result.

First, the bits [28-31] are placed from EA to variable sh. After that,
the bytes are created in the following way:
sh:(sh+7) of X(from description) by multiplying sh with 0x0101010101010101
followed by substraction of the result from 0x1011121314151617. Value
obtained is placed in higher doubleword element of vD.
(sh+8):(sh+15) by substracting the result of previous multiplication from
0x18191a1b1c1d1e1f. Value obtained is placed in lower doubleword element
of vD.

Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1563200574-11098-2-git-send-email-stefan.brankovic@rt-rk.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agomigration: Do not re-read the clock on pre_save in case of paused guest
Maxiwell S. Garcia [Thu, 11 Jul 2019 19:47:02 +0000 (16:47 -0300)] 
migration: Do not re-read the clock on pre_save in case of paused guest

Re-read the timebase before migrate was ported from x86 commit:
   6053a86fe7bd: kvmclock: reduce kvmclock difference on migration

The clock move makes the guest knows about the paused time between
the stop and migrate commands. This is an issue in an already-paused
VM because some side effects, like process stalls, could happen
after migration.

So, this patch checks the runstate of guest in the pre_save handler and
do not re-reads the timebase in case of paused state (cold migration).

Signed-off-by: Maxiwell S. Garcia <maxiwell@linux.ibm.com>
Message-Id: <20190711194702.26598-1-maxiwell@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr_pci: Allow 2MiB and 16MiB IOMMU pagesizes by default
David Gibson [Fri, 5 Jul 2019 05:03:05 +0000 (15:03 +1000)] 
spapr_pci: Allow 2MiB and 16MiB IOMMU pagesizes by default

We've had the qemu and kernel KVM infrastructure to handle larger TCE
page sizes for a while, but forgot to update the defaults to actually
allow them.  This turns that change on.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agohw: add compat machines for 4.2
Cornelia Huck [Wed, 24 Jul 2019 10:35:24 +0000 (12:35 +0200)] 
hw: add compat machines for 4.2

Add 4.2 machine types for arm/i440fx/q35/s390x/spapr.

For i440fx and q35, unversioned cpu models are still translated
to -v1, as 0788a56bd1ae ("i386: Make unversioned CPU models be
aliases") states this should only transition to the latest cpu
model version in 4.3 (or later).

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190724103524.20916-1-cohuck@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr_iommu: Fix xlate trace to print translated address
Alexey Kardashevskiy [Mon, 12 Aug 2019 05:42:02 +0000 (15:42 +1000)] 
spapr_iommu: Fix xlate trace to print translated address

Currently we basically print IO address twice, fix this.

Fixes: 7e472264e9e2 ("PPC: spapr: iommu: rework traces")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20190812054202.125492-1-aik@ozlabs.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agospapr: quantify error messages regarding capability settings
Daniel Black [Mon, 12 Aug 2019 07:10:44 +0000 (17:10 +1000)] 
spapr: quantify error messages regarding capability settings

Its not immediately obvious how cap-X=Y setting need to be applied
to the command line so, for spapr capability error messages, this
has been clarified to:

 appending -machine cap-X=Y

The wrong value messages have been left as is, as the user has found
the right location.

Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Daniel Black <daniel@linux.ibm.com>
Message-Id: <20190812071044.30806-1-daniel@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2 years agox86: Intel AVX512_BF16 feature enabling
Jing Liu [Thu, 25 Jul 2019 06:14:16 +0000 (14:14 +0800)] 
x86: Intel AVX512_BF16 feature enabling

Intel CooperLake cpu adds AVX512_BF16 instruction, defining as
CPUID.(EAX=7,ECX=1):EAX[bit 05].

The patch adds a property for setting the subleaf of CPUID leaf 7 in
case that people would like to specify it.

The release spec link as follows,
https://software.intel.com/sites/default/files/managed/c5/15/\
architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoscsi: lsi: exit infinite loop while executing script (CVE-2019-12068)
Paolo Bonzini [Wed, 14 Aug 2019 12:05:21 +0000 (17:35 +0530)] 
scsi: lsi: exit infinite loop while executing script (CVE-2019-12068)

When executing script in lsi_execute_script(), the LSI scsi adapter
emulator advances 's->dsp' index to read next opcode. This can lead
to an infinite loop if the next opcode is empty. Move the existing
loop exit after 10k iterations so that it covers no-op opcodes as
well.

Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotest-bitmap: test set 1 bit case for bitmap_set
Wei Yang [Wed, 14 Aug 2019 00:27:23 +0000 (08:27 +0800)] 
test-bitmap: test set 1 bit case for bitmap_set

All current bitmap_set test cases set range across word, while the
handle of a range within one word is different from that.

Add case to set 1 bit as a represent for set range within one word.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomigration: do not rom_reset() during incoming migration
Catherine Ho [Mon, 8 Apr 2019 08:42:13 +0000 (04:42 -0400)] 
migration: do not rom_reset() during incoming migration

Commit 18269069c310 ("migration: Introduce ignore-shared capability")
addes ignore-shared capability to bypass the shared ramblock (e,g,
membackend + numa node). It does good to live migration.

As told by Yury,this commit expectes that QEMU doesn't write to guest RAM
until VM starts, but it does on aarch64 qemu:
Backtrace:
1  0x000055f4a296dd84 in address_space_write_rom_internal () at
exec.c:3458
2  0x000055f4a296de3a in address_space_write_rom () at exec.c:3479
3  0x000055f4a2d519ff in rom_reset () at hw/core/loader.c:1101
4  0x000055f4a2d475ec in qemu_devices_reset () at hw/core/reset.c:69
5  0x000055f4a2c90a28 in qemu_system_reset () at vl.c:1675
6  0x000055f4a2c9851d in main () at vl.c:4552

Actually, on arm64 virt marchine, ramblock "dtb" will be filled into ram
druing rom_reset. In ignore-shared incoming case, this rom filling
is not required since all the data has been stored in memory backend
file.

Further more, as suggested by Peter Xu, if we do rom_reset() now with
these ROMs then the RAM data should be re-filled again too with the
migration stream coming in.

Fixes: commit 18269069c310 ("migration: Introduce ignore-shared
capability")
Suggested-by: Yury Kotov <yury-kotov@yandex-team.ru>
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Catherine Ho <catherine.hecx@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoHACKING: Document 'struct' keyword usage
Eduardo Habkost [Mon, 12 Aug 2019 23:46:30 +0000 (20:46 -0300)] 
HACKING: Document 'struct' keyword usage

Sometimes we use the 'struct' keyword in headers to help us
reduce dependencies between header files.  Document that
practice.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agokvm: vmxcap: Enhance with latest features
Jan Kiszka [Tue, 13 Aug 2019 06:29:33 +0000 (08:29 +0200)] 
kvm: vmxcap: Enhance with latest features

Based on SDM from May 2019.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agocpus-common: nuke finish_safe_work
Roman Kagan [Thu, 23 May 2019 10:54:48 +0000 (10:54 +0000)] 
cpus-common: nuke finish_safe_work

It was introduced in commit ab129972c8b41e15b0521895a46fd9c752b68a5e,
with the following motivation:

  Because start_exclusive uses CPU_FOREACH, merge exclusive_lock with
  qemu_cpu_list_lock: together with a call to exclusive_idle (via
  cpu_exec_start/end) in cpu_list_add, this protects exclusive work
  against concurrent CPU addition and removal.

However, it seems to be redundant, because the cpu-exclusive
infrastructure provides suffificent protection against the newly added
CPU starting execution while the cpu-exclusive work is running, and the
aforementioned traversing of the cpu list is protected by
qemu_cpu_list_lock.

Besides, this appears to be the only place where the cpu-exclusive
section is entered with the BQL taken, which has been found to trigger
AB-BA deadlock as follows:

    vCPU thread                             main thread
    -----------                             -----------
async_safe_run_on_cpu(self,
                      async_synic_update)
...                                         [cpu hot-add]
process_queued_cpu_work()
  qemu_mutex_unlock_iothread()
                                            [grab BQL]
  start_exclusive()                         cpu_list_add()
  async_synic_update()                        finish_safe_work()
    qemu_mutex_lock_iothread()                  cpu_exec_start()

So remove it.  This paves the way to establishing a strict nesting rule
of never entering the exclusive section with the BQL taken.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-Id: <20190523105440.27045-2-rkagan@virtuozzo.com>

2 years agoicount: remove unnecessary gen_io_end calls
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:55 +0000 (11:44 +0300)] 
icount: remove unnecessary gen_io_end calls

Prior patch resets can_do_io flag at the TB entry. Therefore there is no
need in resetting this flag at the end of the block.
This patch removes redundant gen_io_end calls.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404429499.18669.13404064982854123855.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@gmail.com>
2 years agoicount: clean up cpu_can_io at the entry to the block
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:49 +0000 (11:44 +0300)] 
icount: clean up cpu_can_io at the entry to the block

Most of IO instructions can be executed only at the end of the block in
icount mode. Therefore translator can set cpu_can_io flag when translating
the last instruction.
But when the blocks are chained, then this flag is not reset and may
remain set at the beginning of the next block.
This patch resets the flag at the entry of any translation block,
making I/O operations impossible by default.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
--

v2 changes:
 - reset can_do_io at the start of every TB (suggested by Paolo Bonzini)
Message-Id: <156404428943.18669.15747009371169578935.stgit@pasha-Precision-3630-Tower>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoreplay: rename step-related variables and functions
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:43 +0000 (11:44 +0300)] 
replay: rename step-related variables and functions

This patch renames replay_get_current_step() and related variables
to make these names consistent with existing 'icount' command line
option and future record/replay hmp/qmp commands.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404428377.18669.15476429889039912070.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoreplay: refine replay-time module
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:38 +0000 (11:44 +0300)] 
replay: refine replay-time module

This patch removes refactoring artifacts from the replay/replay-time.c

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404427799.18669.8072341590511911277.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoreplay: fix replay shutdown
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:32 +0000 (11:44 +0300)] 
replay: fix replay shutdown

This patch fixes shutdown of the replay process, which is terminated with
the assert when shutdown event is read from the log.
replay_finish_event reads new data_kind and therefore the value of data_kind
should be preserved to be valid at qemu_system_shutdown_request call.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404427238.18669.12378772823692338069.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoutil/qemu-timer: refactor deadline calculation for external timers
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:26 +0000 (11:44 +0300)] 
util/qemu-timer: refactor deadline calculation for external timers

icount-based record/replay uses qemu_clock_deadline_ns_all to measure
the period until vCPU may be interrupted.
This function takes in account the virtual timers, because they belong
to the virtual devices that may generate interrupt request or affect
the virtual machine state.
However, there are a subset of virtual timers, that are marked with
'external' flag. These do not change the virtual machine state and
only based on virtual clock. Calculating the deadling using the external
timers breaks the determinism, because they do not belong to the replayed
part of the virtual machine.
This patch fixes the deadline calculation for this case by adding
new parameter for skipping the external timers when it is needed.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
--

v2 changes:
 - added new parameter for timer attribute mask
Message-Id: <156404426682.18669.17014100602930969222.stgit@pasha-Precision-3630-Tower>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoreplay: document development rules
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:21 +0000 (11:44 +0300)] 
replay: document development rules

This patch introduces docs/devel/replay.txt which describes the rules
that should be followed to make virtual devices usable in record/replay mode.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgauk@ispras.ru>
--

v9: fixed external virtual clock description (reported by Artem Pisarenko)
Message-Id: <156404426119.18669.6707258931552832854.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2 years agoreplay: add missing fix for internal function
Pavel Dovgalyuk [Thu, 25 Jul 2019 08:44:15 +0000 (11:44 +0300)] 
replay: add missing fix for internal function

This is a fix which was missed by patch
74c0b816adfc6aa1b01b4426fdf385e32e35cbac, which added current_step
parameter to the replay_advance_current_step function.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <156404425561.18669.13015037579222450241.stgit@pasha-Precision-3630-Tower>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotimer: last, remove last bits of last
Dr. David Alan Gilbert [Wed, 24 Jul 2019 11:58:23 +0000 (12:58 +0100)] 
timer: last, remove last bits of last

The reset notifiers kept a 'last' counter to notice jumps;
now that we've remove the notifier we don't need to keep 'last'.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-5-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoreplay: Remove host_clock_last
Dr. David Alan Gilbert [Wed, 24 Jul 2019 11:58:22 +0000 (12:58 +0100)] 
replay: Remove host_clock_last

Now we're not using the 'last' field in the timer, remove it from
replay.

Bump the version number of the replay structure since we've
removed the field.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-4-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotimer: Remove reset notifiers
Dr. David Alan Gilbert [Wed, 24 Jul 2019 11:58:21 +0000 (12:58 +0100)] 
timer: Remove reset notifiers

Remove the reset notifer from the core qemu-timer code.
The only user was mc146818 and we've just remove it's use.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-3-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomc146818rtc: Remove reset notifiers
Dr. David Alan Gilbert [Wed, 24 Jul 2019 11:58:20 +0000 (12:58 +0100)] 
mc146818rtc: Remove reset notifiers

The reset notifiers are unreliable and recalculating the offsets
after boot causes problems with migration in cases where explicit
base times are set on the destination.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190724115823.4199-2-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agomemory: fix race between TCG and accesses to dirty bitmap
Paolo Bonzini [Tue, 6 Feb 2018 17:37:39 +0000 (18:37 +0100)] 
memory: fix race between TCG and accesses to dirty bitmap

There is a race between TCG and accesses to the dirty log:

      vCPU thread                  reader thread
      -----------------------      -----------------------
      TLB check -> slow path
        notdirty_mem_write
          write to RAM
          set dirty flag
                                   clear dirty flag
      TLB check -> fast path
                                   read memory
        write to RAM

Fortunately, in order to fix it, no change is required to the
vCPU thread.  However, the reader thread must delay the read after
the vCPU thread has finished the write.  This can be approximated
conservatively by run_on_cpu, which waits for the end of the current
translation block.

A similar technique is used by KVM, which has to do a synchronous TLB
flush after doing a test-and-clear of the dirty-page flags.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotarget/i386: Return 'indefinite integer value' for invalid SSE fp->int conversions
Peter Maydell [Mon, 5 Aug 2019 18:03:32 +0000 (19:03 +0100)] 
target/i386: Return 'indefinite integer value' for invalid SSE fp->int conversions

The x86 architecture requires that all conversions from floating
point to integer which raise the 'invalid' exception (infinities of
both signs, NaN, and all values which don't fit in the destination
integer) return what the x86 spec calls the "indefinite integer
value", which is 0x8000_0000 for 32-bits or 0x8000_0000_0000_0000 for
64-bits.  The softfloat functions return the more usual behaviour of
positive overflows returning the maximum value that fits in the
destination integer format and negative overflows returning the
minimum value that fits.

Wrap the softfloat functions in x86-specific versions which
detect the 'invalid' condition and return the indefinite integer.

Note that we don't use these wrappers for the 3DNow! pf2id and pf2iw
instructions, which do return the minimum value that fits in
an int32 if the input float is a large negative number.

Fixes: https://bugs.launchpad.net/qemu/+bug/1815423
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20190805180332.10185-1-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoi386/kvm: initialize struct at full before ioctl call
Andrey Shinkevich [Tue, 30 Jul 2019 16:01:38 +0000 (19:01 +0300)] 
i386/kvm: initialize struct at full before ioctl call

Not the whole structure is initialized before passing it to the KVM.
Reduce the number of Valgrind reports.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1564502498-805893-4-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotests: Fix uninitialized byte in test_visitor_in_fuzz
Andrey Shinkevich [Tue, 30 Jul 2019 16:01:37 +0000 (19:01 +0300)] 
tests: Fix uninitialized byte in test_visitor_in_fuzz

One byte in the local buffer stays uninitialized, at least with the
first iteration, because of the double decrement in the
test_visitor_in_fuzz(). This is what Valgrind does not like and not
critical for the test itself. So, reduce the number of the memory
issues reports.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1564502498-805893-3-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotest-throttle: Fix uninitialized use of burst_length
Andrey Shinkevich [Tue, 30 Jul 2019 16:01:36 +0000 (19:01 +0300)] 
test-throttle: Fix uninitialized use of burst_length

ThrottleState::cfg of the static variable 'ts' is reassigned with the
local one in the do_test_accounting() and then is passed to the
throttle_account() with uninitialized member LeakyBucket::burst_length.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <1564502498-805893-2-git-send-email-andrey.shinkevich@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotarget-i386: kvm: 'kvm_get_supported_msrs' cleanup
Li Qiang [Thu, 25 Jul 2019 15:16:39 +0000 (08:16 -0700)] 
target-i386: kvm: 'kvm_get_supported_msrs' cleanup

Function 'kvm_get_supported_msrs' is only called once
now, get rid of the static variable 'kvm_supported_msrs'.

Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20190725151639.21693-1-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years ago9p: simplify source file selection
Paolo Bonzini [Thu, 25 Jul 2019 10:03:30 +0000 (12:03 +0200)] 
9p: simplify source file selection

Express the complex conditions in Kconfig rather than Makefiles, since Kconfig
is better suited at expressing dependencies and detecting contradictions.

Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoconfigure: Define target access alignment in configure
tony.nguyen@bt.com [Thu, 18 Jul 2019 06:01:31 +0000 (06:01 +0000)] 
configure: Define target access alignment in configure

This patch moves the define of target access alignment earlier from
target/foo/cpu.h to configure.

Suggested in Richard Henderson's reply to "[PATCH 1/4] tcg: TCGMemOp is now
accelerator independent MemOp"

Signed-off-by: Tony Nguyen <tony.nguyen@bt.com>
Message-Id: <11e818d38ebc40e986cfa62dd7d0afdc@tpw09926dag18e.domain1.systemhost.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: tony.nguyen@bt.com <tony.nguyen@bt.com>
2 years agomemory: assert on out of scope notification
Yan Zhao [Tue, 25 Jun 2019 03:21:18 +0000 (11:21 +0800)] 
memory: assert on out of scope notification

It is wrong for an entry to have parts out of scope of notifier's range.
assert this condition.

Out of scope mapping/unmapping would cause problem, as in below case:

1. initially there are two notifiers with ranges
0-0xfedfffff, 0xfef00000-0xffffffffffffffff,
IOVAs from 0x3c000000 - 0x3c1fffff is in shadow page table.

2. in vfio, memory_region_register_iommu_notifier() is followed by
memory_region_iommu_replay(), which will first call address space
unmap,
and walk and add back all entries in vtd shadow page table. e.g.
(1) for notifier 0-0xfedfffff,
    IOVAs from 0 - 0xffffffff get unmapped,
    and IOVAs from 0x3c000000 - 0x3c1fffff get mapped
(2) for notifier 0xfef00000-0xffffffffffffffff
    IOVAs from 0 - 0x7fffffffff get unmapped,
    but IOVAs from 0x3c000000 - 0x3c1fffff cannot get mapped back.

Cc: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-Id: <1561432878-13754-1-git-send-email-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agohw/i386/pc: Map into memory the initrd
Stefano Garzarella [Wed, 24 Jul 2019 14:31:05 +0000 (16:31 +0200)] 
hw/i386/pc: Map into memory the initrd

In order to reduce the memory footprint we map into memory
the initrd using g_mapped_file_new() instead of reading it.
In this way we can share the initrd pages between multiple
instances of QEMU.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20190724143105.307042-4-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoelf-ops.h: Map into memory the ELF to load
Stefano Garzarella [Wed, 24 Jul 2019 14:31:04 +0000 (16:31 +0200)] 
elf-ops.h: Map into memory the ELF to load

In order to reduce the memory footprint we map into memory
the ELF to load using g_mapped_file_new_from_fd() instead of
reading each sections. In this way we can share the ELF pages
between multiple instances of QEMU.

Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20190724143105.307042-3-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoloader: Handle memory-mapped ELFs
Stefano Garzarella [Wed, 24 Jul 2019 14:31:03 +0000 (16:31 +0200)] 
loader: Handle memory-mapped ELFs

This patch allows handling an ELF memory-mapped, taking care
the reference count of the GMappedFile* passed through
rom_add_elf_program().
In this case, the 'data' pointer is not heap-allocated, so
we cannot free it.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20190724143105.307042-2-sgarzare@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agotarget-i386: adds PV_SCHED_YIELD CPUID feature bit
Wanpeng Li [Wed, 10 Jul 2019 08:02:51 +0000 (16:02 +0800)] 
target-i386: adds PV_SCHED_YIELD CPUID feature bit

Adds PV_SCHED_YIELD CPUID feature bit.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1562745771-8414-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agokvm: i386: halt poll control MSR support
Marcelo Tosatti [Mon, 3 Jun 2019 23:04:08 +0000 (20:04 -0300)] 
kvm: i386: halt poll control MSR support

Add support for halt poll control MSR: save/restore, migration
and new feature name.

The purpose of this MSR is to allow the guest to disable
host halt poll.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <20190603230408.GA7938@amt.cnet>
[Do not enable by default, as pointed out by Mark Kanda. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoMerge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2019-08-20' into...
Peter Maydell [Tue, 20 Aug 2019 13:14:20 +0000 (14:14 +0100)] 
Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2019-08-20' into staging

- Improvements for the Kconfig switches and Makefiles

# gpg: Signature made Tue 20 Aug 2019 08:26:41 BST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/huth-gitlab/tags/pull-request-2019-08-20:
  hw/core: Add a config switch for the generic loader device
  hw/misc: Add a config switch for the "unimplemented" device
  hw/core: Add a config switch for the "split-irq" device
  hw/core: Add a config switch for the "or-irq" device
  hw/core: Add a config switch for the "register" device
  hw/dma: Do not build the xlnx_dpdma device for the MicroBlaze machines
  hw/intc: Only build the xlnx-iomod-intc device for the MicroBlaze PMU
  hw/Kconfig: Move the generic XLNX_ZYNQMP to the root hw/Kconfig

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2 years agoMerge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-aug-20-2019' into...
Peter Maydell [Tue, 20 Aug 2019 12:40:48 +0000 (13:40 +0100)] 
Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-aug-20-2019' into staging

MIPS queue for August 20th, 2019

# gpg: Signature made Mon 19 Aug 2019 19:07:18 BST
# gpg:                using RSA key D4972A8967F75A65
# gpg: Good signature from "Aleksandar Markovic <amarkovic@wavecomp.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8526 FBF1 5DA3 811F 4A01  DD75 D497 2A89 67F7 5A65

* remotes/amarkovic/tags/mips-queue-aug-20-2019:
  target/mips: tests/tcg: Fix target configurations for MSA tests
  target/mips: tests/tcg: Add optional printing of more detailed failure info
  target/mips: Style improvements in mips_mipssim.c
  target/mips: Style improvements in mips_malta.c
  target/mips: Style improvements in mips_int.c
  target/mips: Style improvements in mips_fulong2e.c
  target/mips: Style improvements in cps.c
  target/mips: Style improvements in translate.c
  target/mips: Style improvements in machine.c
  target/mips: Style improvements in cpu.c
  target/mips: Style improvements in cp0_timer.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2 years agoMerge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-19' into staging
Peter Maydell [Tue, 20 Aug 2019 09:27:24 +0000 (10:27 +0100)] 
Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-08-19' into staging

Block patches:
- preallocation=falloc/full support for LUKS
- Various minor fixes

# gpg: Signature made Mon 19 Aug 2019 16:36:45 BST
# gpg:                using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40
# gpg:                issuer "mreitz@redhat.com"
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full]
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/maxreitz/tags/pull-block-2019-08-19:
  doc: Preallocation does not require writing zeroes
  iotests: Fix 141 when run with qed
  vpc: Do not return RAW from block_status
  vmdk: Make block_status recurse for flat extents
  vdi: Make block_status recurse for fixed images
  iotests: Full mirror to existing non-zero image
  iotests: Test convert -n to pre-filled image
  iotests: Convert to preallocated encrypted qcow2
  vhdx: Fix .bdrv_has_zero_init()
  vdi: Fix .bdrv_has_zero_init()
  qcow2: Fix .bdrv_has_zero_init()
  block: Use bdrv_has_zero_init_truncate()
  block: Implement .bdrv_has_zero_init_truncate()
  block: Add bdrv_has_zero_init_truncate()
  mirror: Fix bdrv_has_zero_init() use
  qemu-img: Fix bdrv_has_zero_init() use in convert
  LUKS: support preallocation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2 years agohw/core: Add a config switch for the generic loader device
Thomas Huth [Tue, 30 Jul 2019 13:40:50 +0000 (15:40 +0200)] 
hw/core: Add a config switch for the generic loader device

The generic loader device is completely optional. Let's add a proper
config switch for it so that people can disable it if they don't need
it and want to create a minimalistic QEMU binary.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190817101931.28386-9-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2 years agohw/misc: Add a config switch for the "unimplemented" device
Thomas Huth [Tue, 14 May 2019 05:26:53 +0000 (07:26 +0200)] 
hw/misc: Add a config switch for the "unimplemented" device

The device is only used by some few boards. Let's use a proper Kconfig
switch so that we only compile this code if we really need it.

Message-Id: <20190817101931.28386-8-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2 years agohw/core: Add a config switch for the "split-irq" device
Thomas Huth [Tue, 14 May 2019 08:24:28 +0000 (10:24 +0200)] 
hw/core: Add a config switch for the "split-irq" device

The "split-irq" device is currently only used by machines that use
CONFIG_ARMSSE. Let's add a proper CONFIG_SPLIT_IRQ switch for this
so that it only gets compiled when we really need it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190817101931.28386-7-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2 years agohw/core: Add a config switch for the "or-irq" device
Thomas Huth [Tue, 14 May 2019 06:13:28 +0000 (08:13 +0200)] 
hw/core: Add a config switch for the "or-irq" device

The "or-irq" device is only used by certain machines. Let's add
a proper config switch for it so that it only gets compiled when we
really need it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190817101931.28386-6-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2 years agohw/core: Add a config switch for the "register" device
Thomas Huth [Tue, 14 May 2019 05:59:34 +0000 (07:59 +0200)] 
hw/core: Add a config switch for the "register" device

The "register" device is only used by certain machines. Let's add
a proper config switch for it so that it only gets compiled when we
really need it.

Message-Id: <20190817101931.28386-5-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2 years agohw/dma: Do not build the xlnx_dpdma device for the MicroBlaze machines
Philippe Mathieu-Daudé [Sat, 27 Apr 2019 14:14:59 +0000 (16:14 +0200)] 
hw/dma: Do not build the xlnx_dpdma device for the MicroBlaze machines

The xlnx_dpdma device is only used by the ZynqMP AArch64 machine
(not the MicroBlaze PMU). Remove it from the ZynqMP generic objects.
(Note, this entry was duplicated for the AArch64).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190427141459.19728-4-philmd@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2 years agohw/intc: Only build the xlnx-iomod-intc device for the MicroBlaze PMU
Philippe Mathieu-Daudé [Sat, 27 Apr 2019 14:14:58 +0000 (16:14 +0200)] 
hw/intc: Only build the xlnx-iomod-intc device for the MicroBlaze PMU

The Xilinx I/O Module Interrupt Controller is only used by the
MicroBlaze PMU, not by the AArch64 machine.
Move it from the generic ZynqMP object list to the PMU specific.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190427141459.19728-3-philmd@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>