skiboot.git
3 days agoubuntu:rolling now missing libcrypto.so.1.0.0, remove p8 mambo master github/master
Stewart Smith [Tue, 21 May 2019 01:54:40 +0000 (11:54 +1000)] 
ubuntu:rolling now missing libcrypto.so.1.0.0, remove p8 mambo

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 days agoWith new GCC comes larger GCOV binaries
Stewart Smith [Tue, 21 May 2019 01:44:23 +0000 (11:44 +1000)] 
With new GCC comes larger GCOV binaries

So we need to change our heap size to make more room for data/bss
without having to change where the console is or have more fun moving
things about.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 days agoIntentionally discard fini_array sections
Stewart Smith [Tue, 21 May 2019 01:29:38 +0000 (11:29 +1000)] 
Intentionally discard fini_array sections

Produced in a SKIBOOT_GCOV=1 build, and never called by skiboot.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 days agoopal-ci: Add Fedora 30
Stewart Smith [Mon, 20 May 2019 23:52:16 +0000 (09:52 +1000)] 
opal-ci: Add Fedora 30

Disable Fedora30 on ppc64le due to mysterious failures

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agohw/xive.c: Fix memcmp() in DEBUG build to compare struct not ptr
Stewart Smith [Mon, 20 May 2019 04:45:32 +0000 (14:45 +1000)] 
hw/xive.c: Fix memcmp() in DEBUG build to compare struct not ptr

With GCC9:

hw/xive.c: In function ‘xive_check_eq_update’:
hw/xive.c:3034:29: error: argument to ‘sizeof’ in ‘__builtin_memcmp’ call is the same expression as the first source; did you mean to dereference it? [-Werror=sizeof-pointer-memaccess]
  if (memcmp(eq, &eq2, sizeof(eq)) != 0) {
                             ^
hw/xive.c: In function ‘xive_check_vpc_update’:
hw/xive.c:3056:29: error: argument to ‘sizeof’ in ‘__builtin_memcmp’ call is the same expression as the first source; did you mean to dereference it? [-Werror=sizeof-pointer-memaccess]
  if (memcmp(vp, &vp2, sizeof(vp)) != 0) {
                             ^
cc1: all warnings being treated as errors

Fixes: 2eea386767728
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoexternal/trace: Add follow option to dump_trace
Jordan Niethe [Mon, 1 Apr 2019 23:43:27 +0000 (10:43 +1100)] 
external/trace: Add follow option to dump_trace

When monitoring traces, an option like the tail command's '-f' (follow)
is very useful. This option continues to append to the output as more
data arrives. Add an '-f' option to allow dump_trace to operate
similarly.

Tail also provides a '-s' (sleep time) option that
accompanies '-f'.  This controls how often new input will be polled. Add
a '-s' option that will make dump_trace sleep for N milliseconds before
checking for new input.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoexternal/trace: Add support for dumping multiple buffers
Jordan Niethe [Mon, 1 Apr 2019 23:43:26 +0000 (10:43 +1100)] 
external/trace: Add support for dumping multiple buffers

dump_trace only can dump one trace buffer at a time. It would be handy
to be able to dump multiple buffers and to see the entries from these
buffers displayed in correct timestamp order. Each trace buffer is
already sorted by timestamp so use a heap to implement an efficient
k-way merge. Use the CCAN heap to implement this sort. However the CCAN
heap does not have a 'heap_replace' operation. We need to 'heap_pop'
then 'heap_push' to replace the root which means rebalancing twice
instead of once.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoccan: Add CCAN heap source
Jordan Niethe [Mon, 1 Apr 2019 23:43:25 +0000 (10:43 +1100)] 
ccan: Add CCAN heap source

We would like to be able to use dump_trace to dump multiple trace
buffers at a time. The entries should be displayed in timestamp order.
As each buffer is already ordered on timestamp, a k-way merge is an
efficient method to sort the buffers together by timestamp. A heap can
be used to implement a k-way merge. As CCAN is already included in
Skiboot, use the CCAN heap. Add the source for heap.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[stewart: ccan/heap: Make test run quieter]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoinclude/mem_region-malloc: Define calloc for CCAN Heap
Jordan Niethe [Mon, 1 Apr 2019 23:43:24 +0000 (10:43 +1100)] 
include/mem_region-malloc: Define calloc for CCAN Heap

We would like to be able to use dump_trace to dump multiple trace
buffers at a time. The entries should be displayed in timestamp order.
As each buffer is already ordered on timestamp, a k-way merge is an
efficient method to sort the buffers together by timestamp. A heap can
be used to implement a k-way merge. As CCAN is already included in
Skiboot, use the CCAN heap. The heap uses the calloc function which is
currently not defined in skiboot. Prepare for adding this heap by
defining calloc. Remove local calloc definition from libffs.c.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoexternal/trace: mmap trace buffers in dump_trace
Jordan Niethe [Mon, 1 Apr 2019 23:43:23 +0000 (10:43 +1100)] 
external/trace: mmap trace buffers in dump_trace

The current lseek/read approach used in dump_trace does not correctly
handle certain aspects of the buffers. It does not use the start and end
position that is part of the buffer so it will not begin from the
correct location. It does not move back to the beginning of the trace
buffer file as the buffer wraps around. It also does not handle the
overflow case of the writer overwriting when the reader is up to.

Mmap the trace buffer file so that the existing reading functions in
extra/trace.c can be used. These functions already handle the cases of
wrapping and overflow.  This reduces code duplication and uses functions
that are already unit tested. However this requires a kernel where the
trace buffer sysfs nodes are able to be mmaped (see
https://patchwork.ozlabs.org/patch/1056786/)

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoexternal/trace: Introduce structure for reading traces
Jordan Niethe [Mon, 1 Apr 2019 23:43:22 +0000 (10:43 +1100)] 
external/trace: Introduce structure for reading traces

Currently the trace_get and trace_empty functions operate on a tracebuf
struct. This requires being able to write to that struct.  If dump_trace
were able to use these functions it would be convenient as it would
reduce code duplication and these functions are already unit tested.
However, a tracebuf accessed via mmaping will not be able to be written.

The tracebuf struct fields that need to be written are only to be used
by a reader.  The fields are never used by a writer. Add a new structure
for readers, trace_reader,  which contains these fields and remove them
from the tracebuf struct.  Change trace_get and trace_empty to use the
trace_reader struct and update the unit tests accordingly.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/trace: Export trace buffers to sysfs
Jordan Niethe [Mon, 1 Apr 2019 23:43:21 +0000 (10:43 +1100)] 
core/trace: Export trace buffers to sysfs

Every property in the device-tree under /ibm,opal/firmware/exports has a
sysfs node created in /firmware/opal/exports. Add properties with the
physical address and size for each trace buffer so they are exported.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/trace: Add pir number to debug_descriptor
Jordan Niethe [Mon, 1 Apr 2019 23:43:20 +0000 (10:43 +1100)] 
core/trace: Add pir number to debug_descriptor

The names given to the trace buffers when exported to sysfs should show
what cpu they are associated with to make it easier to understand there
output.  The debug_descriptor currently stores the address and length of
each trace buffer and this is used for adding properties to the device
tree. Extend debug_descriptor to include a cpu associated with each
trace. This will be used for creating properties in the device-tree
under /ibm,opal/firmware/exports/.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/trace: Change trace buffer size
Jordan Niethe [Mon, 1 Apr 2019 23:43:19 +0000 (10:43 +1100)] 
core/trace: Change trace buffer size

We want to be able to mmap the trace buffers to be used by the
dump_trace tool. As mmaping is done in terms of pages it makes sense
that the size of the trace buffers should be page aligned.  This is
slightly complicated by the space taken up by the header at the
beginning of the trace and the room left for an extra trace entry at the
end of the buffer. Change the size of the buffer itself so that the
entire trace buffer size will be page aligned.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/trace: Change buffer alignment from 4K to 64K
Jordan Niethe [Mon, 1 Apr 2019 23:43:18 +0000 (10:43 +1100)] 
core/trace: Change buffer alignment from 4K to 64K

We want to be able to mmap the trace buffers to be used by the
dump_trace tool. This means that the trace bufferes must be page
aligned.  Currently they are aligned to 4K. Most power systems have a
64K page size. On systems with a 4K page size, 64K aligned will still be
page aligned.  Change the allocation of the trace buffers to be 64K
aligned.

The trace_info struct that contains the trace buffer is actually what is
allocated aligned memory. This means the trace buffer itself is not
actually aligned and this is the address that is currently exposed
through sysfs.  To get around this change the address that is exposed to
sysfs to be the trace_info struct. This means the lock in trace_info is
now visible too.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/trace: Change mask/and to modulo for buffer offset
Jordan Niethe [Mon, 1 Apr 2019 23:43:17 +0000 (10:43 +1100)] 
core/trace: Change mask/and to modulo for buffer offset

We would like the be able to mmap the trace buffers so that the
dump_trace tool is able to make use of the existing functions for
reading traces in external/trace. Mmaping is done by pages which means
that buffers should be aligned to page size. This is not as simple as
setting the buffer length to a page aligned value as the buffers each
have a header and leave space for an extra entry at the end. These must
be taken into account so the entire buffer will be page aligned.

The current method of calculating buffer offsets is to use a mask and
bitwise 'and'. This limits the potential sizes of the buffer to powers
of two. The initial justification for using the mask was that the
buffers had different sizes so the offset needed to based on information
the buffers carried with them, otherwise they could overflow.

Being limited to powers of two will make it impossible to page align the
entire buffer. Change to using modulo for calculating the buffer offset
to make a much larger range of buffer sizes possible. Instead of the
mask, make each buffer carry around the length of the buffer to be used
for calculating the offset to avoid overflows.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/test/run-trace: Stop using indeterminate fields
Jordan Niethe [Mon, 1 Apr 2019 23:43:16 +0000 (10:43 +1100)] 
core/test/run-trace: Stop using indeterminate fields

The parallel_test uses the cpu field of the trace_hdr struct. However it
is expected that some of the trace entries that are gotten will be
trace_overflow structs. This type of entry leaves the cpu field
indeterminate when it is interpreted as a trace_hdr struct. This means
it possible tests will fail when they try to use the cpu field of an
overflow struct, as the cpu field could hold anything.

Move the checks that use the cpu field until after it has been
determined the trace entry is an overflow type.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoexternal/trace: Use correct width integer byte swapping
Jordan Niethe [Mon, 1 Apr 2019 23:43:15 +0000 (10:43 +1100)] 
external/trace: Use correct width integer byte swapping

The trace_repeat struct uses be16 for storing the number of repeats.
Currently be32_to_cpu conversion is used to display this member. This
produces an incorrect value. Use be16_to_cpu instead.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoexternal/trace: Fix endianness detection in Makefile
Jordan Niethe [Mon, 1 Apr 2019 23:43:14 +0000 (10:43 +1100)] 
external/trace: Fix endianness detection in Makefile

The Makefile for the dump_trace tool does not correctly determine
endianness on Power. Instead Big Endian is always used on Power. Fix so
Little Endian will be detected.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/trace: Put boot_tracebuf in correct location.
Jordan Niethe [Mon, 1 Apr 2019 23:43:13 +0000 (10:43 +1100)] 
core/trace: Put boot_tracebuf in correct location.

A position for the boot_tracebuf is allocated in skiboot.lds.S.
However, without a __section attribute the boot trace buffer is not
placed in the correct location, meaning that it also will not be
correctly aligned.  Add the __section attribute to ensure it will be
placed in its allocated position.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/test/run-trace: Fix type in testing struct
Jordan Niethe [Mon, 1 Apr 2019 23:43:12 +0000 (10:43 +1100)] 
core/test/run-trace: Fix type in testing struct

A mock cpu thread structure is used for testing the skiboot trace
buffers. This contains the parts of the actual structure that are needed
for the tests. Within the mock structure the type used for servo_no is
different from the real structure. Change the type so it is consistent.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agonx: remove check on the "qemu, powernv" property
Cédric Le Goater [Thu, 11 Apr 2019 14:45:38 +0000 (16:45 +0200)] 
nx: remove check on the "qemu, powernv" property

commit 95f7b3b9698b ("nx: Don't abort on missing NX when using a QEMU
machine") introduced a check on the property "qemu,powernv" to skip NX
initialization when running under a QEMU machine.

The QEMU platforms now expose a QUIRK_NO_RNG in the chip. Testing the
"qemu,powernv" property is not necessary anymore.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agochip: add no-nx quirk for all QEMU platforms
Cédric Le Goater [Thu, 11 Apr 2019 14:45:37 +0000 (16:45 +0200)] 
chip: add no-nx quirk for all QEMU platforms

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoplat/qemu: add a POWER8 and POWER9 platform
Cédric Le Goater [Thu, 11 Apr 2019 14:45:36 +0000 (16:45 +0200)] 
plat/qemu: add a POWER8 and POWER9 platform

These new QEMU platforms have characteristics closer to real OpenPOWER
systems that we use today and define a different BMC depending on the
CPU type. New platform properties are introduced for each,
"qemu,powernv8", "qemu,powernv9" and these should be compatible with
existing QEMUs which only expose the "qemu,powernv" property

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agocore/lock: Add debug options to store backtrace of where lock was taken
Andrew Donnellan [Thu, 18 Apr 2019 05:21:06 +0000 (15:21 +1000)] 
core/lock: Add debug options to store backtrace of where lock was taken

Contrary to popular belief, skiboot developers are imperfect and
occasionally write locking bugs. When we exit skiboot, we check if we're
still holding any locks, and if so, we print an error with a list of the
locks currently held and the locations where they were taken.

However, this only tells us the location where lock() was called, which may
not be enough to work out what's going on. To give us more to go on with,
we can store backtrace data in the lock and print that out when we
unexpectedly still hold locks.

Because the backtrace data is rather big, we only enable this if
DEBUG_LOCKS_BACKTRACE is defined, which in turn is switched on when
DEBUG=1.

(We disable DEBUG_LOCKS_BACKTRACE in some of the memory allocation tests
because the locks used by the memory allocator take up too much room in the
fake skiboot heap.)

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agohw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory
Andrew Donnellan [Tue, 14 May 2019 01:10:33 +0000 (11:10 +1000)] 
hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory

Lowest Point of Coherency (LPC) memory allows the host to access memory on
an OpenCAPI device.

Define 2 OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE, for
assigning and clearing the memory BAR. (We try to avoid using the term
"LPC" to avoid confusion with Low Pin Count.)

At present, we use a fixed location in the address space, which means we
are restricted to a single range of 4TB, on a single OpenCAPI device per
chip. In future, we'll use some chip ID extension magic to give us more
space, and some sort of allocator to assign ranges to more than one device.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agolibc/string: speed up common string functions
Nicholas Piggin [Fri, 10 May 2019 04:47:09 +0000 (14:47 +1000)] 
libc/string: speed up common string functions

Use compiler builtins for the string functions, and compile the
libc/string/ directory with -O2.

This reduces instructions booting skiboot in mambo by 2.9 million in
slow-sim mode, or 3.8 in normal mode, for less than 1kB image size
increase.

This can result in the compiler warning more cases of string function
problems.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agodevice-tree: speed up fdt building on slow simulators
Nicholas Piggin [Fri, 10 May 2019 04:46:27 +0000 (14:46 +1000)] 
device-tree: speed up fdt building on slow simulators

Trade size for speed and avoid de-duplicating strings in the fdt.
This costs about 2kB in fdt size, and saves about 8 million instructions
(almost half of all instructions) booting skiboot in mambo.

This was tracked down by Michael Neuling <mikey@neuling.org>.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agolibfdt: upgrade to upstream dtc.git 243176c
Nicholas Piggin [Fri, 10 May 2019 04:46:26 +0000 (14:46 +1000)] 
libfdt: upgrade to upstream dtc.git 243176c

Upgrade libfdt/ to github.com/dgibson/dtc.git 243176c ("Fix bogus
error on rebuild")

This copies dtc/libfdt/ to skiboot/libfdt/, with the only change in
that directory being the addition of README.skiboot and Makefile.inc.

This adds about 14kB text, 2.5kB compressed xz. This could be reduced
or mostly eliminated by cutting out fdt version checks and unused
code, but tracking upstream is a bigger benefit at the moment.

This loses commits:

  14ed2b842f61 ("libfdt: add basic sanity check to fdt_open_into")
  bc7bb3d12bc1 ("sparse: fix declaration of fdt_strerror")

As well as some prehistoric similar kinds of things, which is the
punishment for us not being good downstream citizens and sending
things upstream! Syncing to upstream will make that effort simpler
in future.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoopal-gard: Account for ECC size when clearing partition
Oliver O'Halloran [Fri, 10 May 2019 04:44:59 +0000 (14:44 +1000)] 
opal-gard: Account for ECC size when clearing partition

When 'opal-gard clear all' is run, it works by erasing the GUARD then
using blockevel_smart_write() to write nothing to the partition. This
second write call is needed because we rely on libflash to set the ECC
bits appropriately when the partition contained ECCed data.

The API for this is a little odd with the caller specifying how much
actual data to write, and libflash writing size + size/8 bytes
since there is one additional ECC byte for every eight bytes of data.

We currently do not account for the extra space consumed by the ECC data
in reset_partition() which is used to handle the 'clear all' command.
Which results in the paritition following the GUARD partition being
partially overwritten when the command is used. This patch fixes the
problem by reducing the length we would normally write by the number
of ECC bytes required.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agohw/phb4: Make pci-tracing print at PR_NOTICE
Oliver O'Halloran [Mon, 6 May 2019 04:00:29 +0000 (14:00 +1000)] 
hw/phb4: Make pci-tracing print at PR_NOTICE

When pci-tracing is enabled we print each trace status message and the
final trace status at PR_ERROR. The final status messages are similar to
those printed when we fail to train in the non-pci-tracing path and this
has resulted in spurious op-test failures.

This patch reduces the log-level of the tracing message to PR_NOTICE so
they're not accidently interpreted as actual error messages. PR_NOTICE
messages are still printed to the console during boot.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agogard: Use consistent name
Vasant Hegde [Sun, 12 May 2019 11:34:01 +0000 (17:04 +0530)] 
gard: Use consistent name

During compilation we generates binary file as "gard" and during
installation we install it as "opal-gard". This seems to be
creating some confusion. Some people are thinking these are two
different tools. Hence lets use common name for gard tool
("opal-gard").

Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoocc-sensors: Check if OCC is reset while reading inband sensors
Shilpasri G Bhat [Tue, 14 May 2019 09:33:52 +0000 (15:03 +0530)] 
occ-sensors: Check if OCC is reset while reading inband sensors

OCC may not be able to mark the sensor buffer as invalid while going
down RESET. If OCC never comes back we will continue to read the stale
sensor data. So verify if OCC is reset while reading the sensor values
and propagate the appropriate error.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoskiboot.tcl: Add option to wait for GDB server connection
Alistair Popple [Thu, 16 May 2019 05:44:33 +0000 (15:44 +1000)] 
skiboot.tcl: Add option to wait for GDB server connection

Add an environment variable which makes Mambo wait for a connection
from gdb prior to starting simulation.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agomambo: Integrate addr2line into backtrace command
Michael Neuling [Wed, 17 Apr 2019 01:54:29 +0000 (11:54 +1000)] 
mambo: Integrate addr2line into backtrace command

Gives nice output like this:

   systemsim % bt
   pc:                             0xC0000000002BF3D4      _savegpr0_28+0x0
   lr:                             0xC00000000004E0F4      opal_call+0x10
   stack:0x000000000041FAE0        0xC00000000004F054      opal_check_token+0x20
   stack:0x000000000041FB50        0xC0000000000500CC      __opal_flush_console+0x88
   stack:0x000000000041FBD0        0xC000000000050BF8      opal_flush_console+0x24
   stack:0x000000000041FC00        0xC0000000001F9510      udbg_opal_putc+0x88
   stack:0x000000000041FC40        0xC000000000020E78      udbg_write+0x7c
   stack:0x000000000041FC80        0xC0000000000B1C44      console_unlock+0x47c
   stack:0x000000000041FD80        0xC0000000000B2424      register_console+0x320
   stack:0x000000000041FE10        0xC0000000003A5328      register_early_udbg_console+0x98
   stack:0x000000000041FE80        0xC0000000003A4F14      setup_arch+0x68
   stack:0x000000000041FEF0        0xC0000000003A0880      start_kernel+0x74
   stack:0x000000000041FF90        0xC00000000000AC60      start_here_common+0x1c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agomambo: Add addr2func for symbol resolution
Michael Neuling [Wed, 17 Apr 2019 01:54:28 +0000 (11:54 +1000)] 
mambo: Add addr2func for symbol resolution

If you supply a VMLINUX_MAP/SKIBOOT_MAP/USER_MAP addr2func can guess
at your symbol name. ie

  systemsim % p pc
  0xC0000000002A68F8
  systemsim % addr2func [p pc]
  fdt_offset_ptr+0x78

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agoAdd P9 DIO interrupt support
Lei YU [Fri, 18 Jan 2019 02:30:06 +0000 (10:30 +0800)] 
Add P9 DIO interrupt support

On P9 there are GPIO port 0, 1, 2 for GPIO interrupt, and DIO interrupt
is used to handle the interrupts.

Add support to the DIO interrupts:
1. Add dio_interrupt_register(chip, port, callback) to register the
   interrupt;
2. Add dio_interrupt_deregister(chip, port, callback) to deregister;
3. When interrupt on the port occurs, callback is invoked, and the
   interrupt status is cleared.

Signed-off-by: Lei YU <mine260309@gmail.com>
[oliver: Fixed Makefile.inc merge conflict]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agofsp/leds: improve string operations bounds checking
Nicholas Piggin [Wed, 8 May 2019 06:17:52 +0000 (16:17 +1000)] 
fsp/leds: improve string operations bounds checking

The current code has a few possible issues with string handling, and
gcc flags a number of string / buffer warnings when enabling more
checking.

Some of the issues in the file:

- Mixing of null-terminated arrays (in most cases), and non-null in the
  input/output buffer format. memcpy generally should be used when the
  length is known.
- Lack of input data length bounds checking. Malformed input could
  cause overruns.
- String copying from same sized source and destination array sizes,
  where the source is a NUL terminated string, so the strncpy copies
  the string without its NUL terminator, which becomes NUL terminated
  at the zeroed destination array. Compiler does not like this, and
  it only works if the destination has been zeroed, so not a great
  pattern.
- Attemping to NUL terminate string using strcat, which will overwrite
  a byte past the end of the array if the string length is at maximum,
  or worse if the input was malformed.

This patch fixes several of these issues and fixes a number of compiler
warnings. In general, the buffer and string handling could probably
benefit from a more in-depth audit.

Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agobuild: allow per-directory flag additions and subtractions
Nicholas Piggin [Wed, 8 May 2019 06:17:51 +0000 (16:17 +1000)] 
build: allow per-directory flag additions and subtractions

Expand the existing per-file flags. This will be used in future
changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agobuild: Makefile clean gcov files
Nicholas Piggin [Wed, 8 May 2019 06:17:50 +0000 (16:17 +1000)] 
build: Makefile clean gcov files

Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agocore/pci: pci_slot_add_loc use null-terminated strings
Nicholas Piggin [Wed, 8 May 2019 06:17:49 +0000 (16:17 +1000)] 
core/pci: pci_slot_add_loc use null-terminated strings

Use null-terminated strings consistently, making the maximum string
length in all cases the same, and avoiding dt_add_property_nstr.

This avoids the following warning that appears after adding more
checking to string ops:

  core/pci-slot.c: In function ‘pci_slot_add_loc’:
  skiboot/libc/include/string.h:19:17: warning: ‘__builtin_strncpy’
    specified bound 80 equals destination size [-Wstringop-truncation]
   #define strncpy __builtin_strncpy
  core/pci-slot.c:244:3: note: in expansion of macro ‘strncpy’
     strncpy(loc_code, label, sizeof(loc_code));
     ^~~~~~~

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agofdt: check more errors
Nicholas Piggin [Wed, 8 May 2019 06:17:48 +0000 (16:17 +1000)] 
fdt: check more errors

This catches a few more error cases in fdt building.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agoexternal/mambo: Add an option to exit Mambo when the system is shutdown
Alistair Popple [Wed, 8 May 2019 06:17:47 +0000 (16:17 +1000)] 
external/mambo: Add an option to exit Mambo when the system is shutdown

Automatically exiting can be convenient for scripting. Will also exit
due to a HW crash (eg. unhandled exception).

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: handle case where SKIBOOT_AUTORUN is not set]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agoxive: Remove xive rev field and recognize P9P
Nicholas Piggin [Wed, 8 May 2019 06:17:46 +0000 (16:17 +1000)] 
xive: Remove xive rev field and recognize P9P

All supported P9s are the revision 2 xive model, so there is no point
to keeping it around. This avoids P9P being reported as unknown rev
(which doesn't cause any other problems).

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agofast-reboot:: skip read-only memory checksum for slow simulators
Nicholas Piggin [Fri, 10 May 2019 04:44:23 +0000 (14:44 +1000)] 
fast-reboot:: skip read-only memory checksum for slow simulators

Skip the fast reboot checksum, which costs about 4 million cycles
booting skiboot in mambo.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agocore/fast-reboot: Add im-feeling-lucky option
Suraj Jitindar Singh [Mon, 13 May 2019 03:18:53 +0000 (13:18 +1000)] 
core/fast-reboot: Add im-feeling-lucky option

Fast reboot gets disabled for a number of reasons e.g. the availability
of nvlink. However this doesn't actually affect the ability to perform fast
reboot if no nvlink device is actually present.

Add a nvram option for fast-reset where if it's set to
"im-feeling-lucky" then perform the fast-reboot irrespective of if it's
previously been disabled.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Acked-by: Russell Currey <ruscur@russell.cc>
[stewart: update nvram_query_eq to nvram_query_eq_dangerous]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agoxscom: move more register definitions into processor-specific includes
Nicholas Piggin [Mon, 13 May 2019 05:21:36 +0000 (15:21 +1000)] 
xscom: move more register definitions into processor-specific includes

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
9 days agonvram: Flag dangerous NVRAM options
Michael Neuling [Mon, 13 May 2019 07:09:39 +0000 (17:09 +1000)] 
nvram: Flag dangerous NVRAM options

Most nvram options used by skiboot are just for debug or testing for
regressions. They should never be used long term.

We've hit a number of issues in testing and the field where nvram
options have been set "temporarily" but haven't been properly cleared
after, resulting in crashes or real bugs being masked.

This patch marks most nvram options used by skiboot as dangerous and
prints a chicken to remind users of the problem.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
11 days agodevicetree: Don't set path to dtc in makefile
Joel Stanley [Fri, 10 May 2019 05:21:55 +0000 (14:51 +0930)] 
devicetree: Don't set path to dtc in makefile

By setting the path we fail to build under buildroot which has it's own
set of host tools in PATH, but not at /usr/bin.

Keep the variable so it can be set if need be but default to whatever
'dtc' is in the users path.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2 weeks agoskiboot v6.3.1 release notes
Vasant Hegde [Fri, 10 May 2019 05:19:41 +0000 (10:49 +0530)] 
skiboot v6.3.1 release notes

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2 weeks agoskiboot v6.2.4 release notes
Vasant Hegde [Wed, 8 May 2019 11:32:09 +0000 (17:02 +0530)] 
skiboot v6.2.4 release notes

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2 weeks agoskiboot v6.0.20 release notes
Vasant Hegde [Wed, 8 May 2019 08:39:15 +0000 (14:09 +0530)] 
skiboot v6.0.20 release notes

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2 weeks agodoc/bmc: Document SBE validation on P8 platforms
Samuel Mendoza-Jonas [Thu, 9 May 2019 03:08:41 +0000 (13:08 +1000)] 
doc/bmc: Document SBE validation on P8 platforms

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2 weeks agoplatforms/astbmc: Check for SBE validation step
Samuel Mendoza-Jonas [Thu, 9 May 2019 03:08:40 +0000 (13:08 +1000)] 
platforms/astbmc: Check for SBE validation step

On some POWER8 astbmc systems an update to the SBE requires pausing at
runtime to ensure integrity of the SBE. If this is required the BMC will
set a chassis boot option IPMI flag using the OEM parameter 0x62. If
Skiboot sees this flag is set it waits until the SBE update is complete
and the flag is cleared.
Unfortunately the mystery operation that validates the SBE also leaves
it in a bad state and unable to be used for timer operations. To
workaround this the flag is checked as soon as possible (ie. when IPMI
and the console are set up), and once complete the system is rebooted.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2 weeks agoinclude/ipmi: Fix incorrect chassis commands
Samuel Mendoza-Jonas [Thu, 9 May 2019 03:08:39 +0000 (13:08 +1000)] 
include/ipmi: Fix incorrect chassis commands

These commands are listed in the order they appear in the IPMI
specification but with the wrong values - correct them!

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2 weeks agoipmi: ensure forward progress on ipmi_queue_msg_sync()
Stewart Smith [Wed, 1 May 2019 07:05:56 +0000 (17:05 +1000)] 
ipmi: ensure forward progress on ipmi_queue_msg_sync()

BT responses are handled using a timer doing the polling. To hope to
get an answer to an IPMI synchronous message, the timer needs to run.

We can't just check all timers though as there may be a timer that
wants a lock that's held by a code path calling ipmi_queue_msg_sync(),
and if we did enforce that as a requirement, it's a pretty subtle
API that is asking to be broken.

So, if we just run a poll function to crank anything that the IPMI
backend needs, then we should be fine.

This issue shows up very quickly under QEMU when loading the first
flash resource with the IPMI HIOMAP backend.

Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2 weeks agopci/iov: Remove skiboot VF tracking
Oliver O'Halloran [Wed, 1 May 2019 08:05:59 +0000 (18:05 +1000)] 
pci/iov: Remove skiboot VF tracking

This feature was added a few years ago in response to a request to make
the MaxPayloadSize (MPS) field of a Virtual Function match the MPS of the
Physical Function that hosts it.

The SR-IOV specification states the the MPS field of the VF is "ResvP".
This indicates the VF will use whatever MPS is configured on the PF and
that the field should be treated as a reserved field in the config space
of the VF. In other words, a SR-IOV spec compliant VF should always return
zero in the MPS field.  Adding hacks in OPAL to make it non-zero is...
misguided at best.

Additionally, there is a bug in the way pci_device structures are handled
by VFs that results in a crash on fast-reboot that occurs if VFs are
enabled and then disabled prior to rebooting. This patch fixes the bug by
removing the code entirely. This patch has no impact on SR-IOV support on
the host operating system.

Cc: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
Cc: skiboot-stable@lists.ozlabs.org
Tested-by: Santwana Samantray <santwana.samantray@in.ibm.com>
Tested-by: Satheesh Rajendran <satheera@in.ibm.com>
[oliver: added tested-bys]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
3 weeks agoskiboot v6.3 release notes v6.3
Stewart Smith [Fri, 3 May 2019 07:28:55 +0000 (17:28 +1000)] 
skiboot v6.3 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoDisable fast-reset for POWER8
Stewart Smith [Fri, 3 May 2019 06:45:53 +0000 (16:45 +1000)] 
Disable fast-reset for POWER8

There is a bug with fast-reset when CPU cores are busy, which can be
reproduced by running `stress` and then trying `reboot -ff` (this is
what the op-test test cases FastRebootHostStress and
FastRebootHostStressTorture do). What happens is the cores lock up,
which isn't the best thing in the world when you want them to start
executing instructions again.

A workaround is to use instruction ramming, which while greatly
increasing the reliability of fast-reset on p8, doesn't make it perfect.

Instruction ramming is what pdbg was modified to do in order to have the
sreset functionality work reliably on p8.
pdbg patches: https://patchwork.ozlabs.org/project/pdbg/list/?series=96593&state=*

Fixes: https://github.com/open-power/skiboot/issues/185
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agopci: Try harder to add meaningful ibm,loc-code
Stewart Smith [Fri, 3 May 2019 06:24:34 +0000 (01:24 -0500)] 
pci: Try harder to add meaningful ibm,loc-code

We keep the existing logic of looking to the parent for the slot-label or
slot-location-code, but we add logic to (if all that fails) we look
directly for the slot-location-code (as this should give us the correct
loc code for things directly under the PHB), and otherwise we just look
for a loc-code.

The applicable bit of PAPR here is:

  R1–12.1–1. Each instance of a hardware entity (FRU) has a platform
  unique location code and any node in the OF
  device tree that describes a part of a hardware entity must include the
  “ibm,loc-code” property with a
  value that represents the location code for that hardware entity.

which we weren't really fully obeying at any recent (ever?) point in
time. Now we should do okay, at least for PCI.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoskiboot v6.3-rc3 release notes v6.3-rc3
Stewart Smith [Thu, 2 May 2019 08:29:58 +0000 (18:29 +1000)] 
skiboot v6.3-rc3 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoMark all partitions except full PNOR and boot kernel firmware read only
Timothy Pearson [Fri, 26 Apr 2019 17:01:04 +0000 (12:01 -0500)] 
Mark all partitions except full PNOR and boot kernel firmware read only

FFS partitions don't always align on erase blocks.  Mark any paritions
not known to align on erase blocks as read only to prevent silent corruption
of adjacent partitions during erase / write from the host.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoExpose PNOR Flash partitions to host MTD driver via devicetree
Timothy Pearson [Fri, 26 Apr 2019 17:00:44 +0000 (12:00 -0500)] 
Expose PNOR Flash partitions to host MTD driver via devicetree

This makes it possible for the host to directly address each
partition without requiring each application to directly parse
the FFS headers.  This has been in use for some time already to
allow BOOTKERNFW partition updates from the host.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoWrite boot progress to LPC ports 81 and 82
Stewart Smith [Fri, 26 Apr 2019 17:00:18 +0000 (12:00 -0500)] 
Write boot progress to LPC ports 81 and 82

There's a thought to write more extensive boot progress codes to LPC
ports 81 and 82 to supplement/replace any reliance on port 80.

We want to still emit port 80 for platforms like Zaius and Barreleye
that have the physical display. Ports 81 and 82 can be monitored by a
BMC though.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoWrite boot progress to LPC port 80h
Stewart Smith [Fri, 26 Apr 2019 16:59:57 +0000 (11:59 -0500)] 
Write boot progress to LPC port 80h

This is an adaptation of what we currently do for op_display() on FSP
machines, inventing an encoding for what we can write into the single
byte at LPC port 80h.

Port 80h is often used on x86 systems to indicate boot progress/status
and dates back a decent amount of time. Since a byte isn't exactly very
expressive for everything that can go on (and wrong) during boot, it's
all about compromise.

Some systems (such as Zaius/Barreleye G2) have a physical dual 7 segment
display that display these codes. So far, this has only been driven by
hostboot (see hostboot commit 90ec2e65314c).

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoRemove Talos DT match from Romulus file
Timothy Pearson [Wed, 24 Apr 2019 07:57:48 +0000 (07:57 +0000)] 
Remove Talos DT match from Romulus file

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoCopy and convert Romulus descriptors to Talos
Timothy Pearson [Fri, 26 Apr 2019 16:59:09 +0000 (11:59 -0500)] 
Copy and convert Romulus descriptors to Talos

Talos II has some hardware differences from Romulus, therefore
we cannot guarantee Talos II == Romulus in skiboot.  Copy and
slightly modify the Romulus files for Talos II.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agohw/phb4: Fix references to PHB3
Oliver O'Halloran [Mon, 29 Apr 2019 07:17:40 +0000 (17:17 +1000)] 
hw/phb4: Fix references to PHB3

Currently most of the functionality of phb4_lsi_attributes() is disabled
when we have #defined DISABLE_ERR_INTS. This is the default behaviour
and #undefing the constant results in skiboot not compiling because the
code was not updated when it was copied across from PHB3. This patch
fixes the problem by changing the names to the phb4 versions.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agonpu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by default
Alexey Kardashevskiy [Mon, 29 Apr 2019 09:12:27 +0000 (19:12 +1000)] 
npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by default

V100 GPUs are known to violate NVLink2 protocol in some cases (one is when
memory was accessed by the CPU and they by GPU using so called block
linear mapping) and issue double probes to NPU which can cope with this
problem only if CONFIG_ENABLE_SNARF_CPM ("disable/enable Probe.I.MO
snarfing a cp_m") is not set in the CQ_SM Misc Config register #0.
If the bit is set (which is the case today), NPU issues the machine
check stop.

The snarfing feature is designed to detect 2 probes in flight and combine
them into one.

This adds a new "opal-npu2-snarf-cpm" nvram variable which controls
CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check
stop from happening.

This disables snarfing by default as otherwise a broken GPU driver can
crash the entire box even when a GPU is passed through to a guest.
This provides a dial to allow regression tests (might be useful for
a bare metal). To enable snarfing, the user needs to run:

sudo nvram -p ibm,skiboot --update-config opal-npu2-snarf-cpm=enable

and reboot the host system.

While at this, define macros for register names as well to avoid touching
same lines over and over again.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agocore/init: LPC isn't just P8 (fix comment)
Stewart Smith [Wed, 1 May 2019 04:21:06 +0000 (14:21 +1000)] 
core/init: LPC isn't just P8 (fix comment)

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agodoc: Add (most) nvram debugging options
Stewart Smith [Tue, 30 Apr 2019 04:38:43 +0000 (14:38 +1000)] 
doc: Add (most) nvram debugging options

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agohw/npu2: Show name of opencapi error interrupts
Frederic Barrat [Wed, 24 Apr 2019 15:31:06 +0000 (17:31 +0200)] 
hw/npu2: Show name of opencapi error interrupts

Add the name of which error interrupt is received.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agocore/pci: Use PHB io-base-location by default for PHB slots
Oliver O'Halloran [Tue, 23 Apr 2019 07:56:07 +0000 (17:56 +1000)] 
core/pci: Use PHB io-base-location by default for PHB slots

On witherspoon only the GPU slots and the three pluggable PCI slots
(SLOT0, 1, 2) have platform defined slot names. For builtin devices such
as the SATA controller or the PLX switch that fans out to the GPU slots
we have no location codes which some people consider an issue.

This patch address the problem by making the ibm,slot-location-code for
the root port device default to the ibm,io-base-location-code which is
typically the location code for the system itself.

e.g.

pciex@600c3c0100000/ibm,loc-code
                 "UOPWR.0000000-Node0-Proc0"

pciex@600c3c0100000/pci@0/ibm,loc-code
                 "UOPWR.0000000-Node0-Proc0"

pciex@600c3c0100000/pci@0/usb-xhci@0/ibm,loc-code
                 "UOPWR.0000000-Node0"

The PHB node, and the root complex nodes have a loc code of the
processor they are attached to, while the usb-xhci device under the
root port has a location code of the system itself.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agohw/phb4: Read ibm,loc-code from PBCQ node
Oliver O'Halloran [Tue, 23 Apr 2019 07:56:06 +0000 (17:56 +1000)] 
hw/phb4: Read ibm,loc-code from PBCQ node

On P9 the PBCQs are subdivided by stacks which implement the PCI Express
logic. When phb4 was forked from phb3 most of the properties that were
in the pbcq node moved into the stack node, but ibm,loc-code was not one
of them. This patch fixes the phb4 init sequence to read the base
location code from the PBCQ node (parent of the stack node) rather than
the stack node itself.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agohw/xscom: P9P rather than P9
Stewart Smith [Wed, 17 Apr 2019 08:11:26 +0000 (18:11 +1000)] 
hw/xscom: P9P rather than P9

Fixes: 2c8f96534a978bb4cac3e4b7dd393a9cc4926555
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agohw/xscom: add missing P9P chip name
Nicholas Piggin [Tue, 16 Apr 2019 05:34:46 +0000 (15:34 +1000)] 
hw/xscom: add missing P9P chip name

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoasm/head: balance branches to avoid link stack predictor mispredicts
Nicholas Piggin [Sun, 14 Apr 2019 05:18:37 +0000 (15:18 +1000)] 
asm/head: balance branches to avoid link stack predictor mispredicts

The Linux wrapper for OPAL call and return is arranged like this:

  __opal_call:
      mflr   r0
      std    r0,PPC_STK_LROFF(r1)
      LOAD_REG_ADDR(r11, opal_return)
      mtlr   r11
      hrfid  -> OPAL

  opal_return:
      ld     r0,PPC_STK_LROFF(r1)
      mtlr   r0
      blr

When skiboot returns to Linux, it branches to LR (i.e., opal_return)
with a blr. This unbalances the link stack predictor and will cause
mispredicts back up the return stack.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoexternal/mambo: also invoke readline for the non-autorun case
Nicholas Piggin [Sat, 13 Apr 2019 10:38:29 +0000 (20:38 +1000)] 
external/mambo: also invoke readline for the non-autorun case

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoasm/head.S: set POWER9 radix HID bit at entry
Nicholas Piggin [Fri, 12 Apr 2019 04:05:29 +0000 (14:05 +1000)] 
asm/head.S: set POWER9 radix HID bit at entry

When running in virtual memory mode, the radix MMU hid bit should not
be changed, so set this in the initial boot SPR setup.

As a side effect, fast reboot also has HID0:RADIX bit set by the
shared spr init, so no need for an explicit call.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoopal-prd: Fix memory leak in is-fsp-system check
Vasant Hegde [Tue, 9 Apr 2019 11:51:25 +0000 (17:21 +0530)] 
opal-prd: Fix memory leak in is-fsp-system check

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoopal-prd: Check malloc return value
Vasant Hegde [Tue, 9 Apr 2019 11:51:24 +0000 (17:21 +0530)] 
opal-prd: Check malloc return value

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoMakefile: Build with symbols
Joel Stanley [Thu, 11 Apr 2019 05:10:13 +0000 (14:40 +0930)] 
Makefile: Build with symbols

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agohw/phb4: Squash the IO bridge window
Oliver O'Halloran [Thu, 7 Mar 2019 02:40:08 +0000 (13:40 +1100)] 
hw/phb4: Squash the IO bridge window

The PCI-PCI bridge spec says that bridges that implement an IO window
should hardcode the IO base and limit registers to zero.
Unfortunately, these registers only define the upper bits of the IO
window and the low bits are assumed to be 0 for the base and 1 for the
limit address. As a result, setting both to zero can be mis-interpreted
as a 4K IO window.

This patch fixes the problem the same way PHB3 does. It sets the IO base
and limit values to 0xf000 and 0x1000 respectively which most software
interprets as a disabled window.

lspci before patch:

0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
I/O behind bridge: 00000000-00000fff

lspci after patch:

0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
I/O behind bridge: None

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agobuild: link with --orphan-handling=warn
Nicholas Piggin [Sun, 14 Apr 2019 12:50:41 +0000 (22:50 +1000)] 
build: link with --orphan-handling=warn

The linker can warn when the linker script does not explicitly place
all sections. These orphan sections are placed according to
heuristics, which may not always be desirable. Enable this warning.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoopal-ci: Centos7 with latest crosstool toolchain (gcc 8.1.0)
Stewart Smith [Wed, 17 Apr 2019 05:50:48 +0000 (15:50 +1000)] 
opal-ci: Centos7 with latest crosstool toolchain (gcc 8.1.0)

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agobuild/lds: place remaining sections according to defaults
Nicholas Piggin [Sun, 14 Apr 2019 12:50:40 +0000 (22:50 +1000)] 
build/lds: place remaining sections according to defaults

Place remaining orphan linker sections according to default script
as described by `ld --verbose`.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agobuild/lds: place debug sections according to defaults
Nicholas Piggin [Sun, 14 Apr 2019 12:50:39 +0000 (22:50 +1000)] 
build/lds: place debug sections according to defaults

Place debug orphan linker sections according to default script
as described by `ld --verbose`.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agobuild: -fno-asynchronous-unwind-tables
Nicholas Piggin [Sun, 14 Apr 2019 12:50:38 +0000 (22:50 +1000)] 
build: -fno-asynchronous-unwind-tables

skiboot does not use unwind tables, this option saves about 100kB,
mostly from .text.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agochiptod: Remove unused prototype from header
Jordan Niethe [Tue, 16 Apr 2019 05:30:23 +0000 (15:30 +1000)] 
chiptod: Remove unused prototype from header

There is prototype for chiptod_reset_tb() in include/chiptod.h. However
no definition is ever provided, nor is it ever used. Remove the
prototype.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agohw/xscom: Enable sw xstop by default on p9
Oliver O'Halloran [Tue, 16 Apr 2019 01:57:01 +0000 (11:57 +1000)] 
hw/xscom: Enable sw xstop by default on p9

This was disabled at some point during bringup to make life easier for
the lab folks trying to debug NVLink issues. This hack really should
have never made it out into the wild though, so we now have the
following situation occuring in the field:

 1) A bad happens
 2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
    request a platform reboot.
 3) OPAL rejects the reboot attempt and returns to the kernel with
    OPAL_PARAMETER.
 4) Kernel panics and attempts to kexec into a kdump kernel.

A side effect of the HMI seems to be CPUs becoming stuck which results
in the initialisation of the kdump kernel taking a extremely long time
(6+ hours). It's also been observed that after performing a dump the
kdump kernel then crashes itself because OPAL has ended up in a bad
state as a side effect of the HMI.

All up, it's not very good so re-enable the software checkstop by
default. If people still want to turn it off they can using the nvram
override.

Cc: skiboot-stable@lists.ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agoopal/hmi: Initialize the hmi event with old value of TFMR.
Mahesh Salgaonkar [Tue, 16 Apr 2019 09:29:28 +0000 (14:59 +0530)] 
opal/hmi: Initialize the hmi event with old value of TFMR.

Do this before we fix TFAC errors. Otherwise the event at host console
shows no thread error reported in TFMR register.

Without this patch the console event show TFMR with no thread error:
(DEC parity error TFMR[59] injection)

[   53.737572] Severe Hypervisor Maintenance interrupt [Recovered]
[   53.737596]  Error detail: Timer facility experienced an error
[   53.737611]  HMER: 0840000000000000
[   53.737621]  TFMR: 3212000870e04000

After this patch it shows old TFMR value on host console:

[ 2302.267271] Severe Hypervisor Maintenance interrupt [Recovered]
[ 2302.267305]  Error detail: Timer facility experienced an error
[ 2302.267320]  HMER: 0840000000000000
[ 2302.267330]  TFMR: 3212000870e14010

Fixes: 674f7696f ("opal/hmi: Rework HMI handling of TFAC errors")
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agocore/pci: Prefer ibm, slot-label when finding loc codes
Oliver O'Halloran [Mon, 15 Apr 2019 03:25:15 +0000 (13:25 +1000)] 
core/pci: Prefer ibm, slot-label when finding loc codes

On OpenPower systems the ibm,slot-label property is used to identify
slots rather than the more verbose ibm,slot-location-code. The
slot-label lookup is currently broken since it assumes that the
ibm,slot-label is in the PCI device node rather than in the node of the
device that provides the slot (e.g. root port or switch downstream
port).

This patch corrects the lookup code to search the parent node (and
possibly it's grandparents), similar to how we search for
ibm,slot-location-code.

Fixes: 1c3baae4f2b3 ("hdata/iohub: Look for IOVPD on P9")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agoskiboot v6.3-rc2 release notes v6.3-rc2
Stewart Smith [Thu, 11 Apr 2019 04:57:57 +0000 (14:57 +1000)] 
skiboot v6.3-rc2 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agotest-ipmi-hiomap: Add read-one-byte test
Vasant Hegde [Mon, 8 Apr 2019 06:05:41 +0000 (11:35 +0530)] 
test-ipmi-hiomap: Add read-one-byte test

Add test case to read:
  - 1 byte
  - 1 block and 1 byte data

Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agotest-ipmi-hiomap: Fix lpc-read-success
Vasant Hegde [Mon, 8 Apr 2019 06:05:40 +0000 (11:35 +0530)] 
test-ipmi-hiomap: Fix lpc-read-success

Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agotest-ipmi-hiomap: Add write-one-byte test
Vasant Hegde [Mon, 8 Apr 2019 06:05:39 +0000 (11:35 +0530)] 
test-ipmi-hiomap: Add write-one-byte test

Add test case to write:
  - 1 byte
  - 1 block and 1 byte data

Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agotest-ipmi-hiomap: Assert if size is zero
Vasant Hegde [Mon, 8 Apr 2019 06:05:38 +0000 (11:35 +0530)] 
test-ipmi-hiomap: Assert if size is zero

Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agolibflash/ipmi-hiomap: Fix blocks count issue
Vasant Hegde [Mon, 8 Apr 2019 06:05:37 +0000 (11:35 +0530)] 
libflash/ipmi-hiomap: Fix blocks count issue

We convert data size to block count and pass block count to BMC.
If data size is not block aligned then we endup sending block count
less than actual data. BMC will write partial data to flash memory.

Sample log :
[  594.388458416,7] HIOMAP: Marked flash dirty at 0x42010 for 8
[  594.398756487,7] HIOMAP: Flushed writes
[  594.409596439,7] HIOMAP: Marked flash dirty at 0x42018 for 3970
[  594.419897507,7] HIOMAP: Flushed writes

In this case HIOMAP sent data with block count=0 and hence BMC didn't
flush data to flash.

Lets fix this issue by adjusting block count before sending it to BMC.

Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agoopal/hmi: Never trust a cow!
Frederic Barrat [Fri, 5 Apr 2019 14:33:04 +0000 (16:33 +0200)] 
opal/hmi: Never trust a cow!

With opencapi, it's fairly common to trigger HMIs during AFU
development on the FPGA, by not replying in time to an NPU command,
for example. So shift the blame reported by that cow to avoid crowding
my mailbox.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agohw/npu2: Dump (more) npu2 registers on link error and HMIs
Frederic Barrat [Fri, 5 Apr 2019 14:33:03 +0000 (16:33 +0200)] 
hw/npu2: Dump (more) npu2 registers on link error and HMIs

We were already logging some NPU registers during an HMI. This patch
cleans up a bit how it is done and separates what is global from what
is specific to nvlink or opencapi.

Since we can now receive an error interrupt when an opencapi link goes
down unexpectedly, we also dump the NPU state but we limit it to the
registers of the brick which hit the error.

The list of registers to dump was worked out with the hw team to
allow for proper debugging. For each register, we print the name as
found in the NPU workbook, the scom address and the register value.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>