skiboot.git
18 hours agoskiboot v6.0.15 release notes master github/master
Vasant Hegde [Fri, 14 Dec 2018 05:59:14 +0000 (11:29 +0530)] 
skiboot v6.0.15 release notes

[ Upstream commit 3bcfff5498b73bdd5697f2e4e0a8b414ad0ae680 ]

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 days agoskiboot v6.2 release notes v6.2
Stewart Smith [Fri, 14 Dec 2018 05:30:09 +0000 (16:30 +1100)] 
skiboot v6.2 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoopal-ci: Drop fedora27, add fedora29
Stewart Smith [Thu, 29 Nov 2018 05:54:06 +0000 (16:54 +1100)] 
opal-ci: Drop fedora27, add fedora29

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Bump Qemu version
Joel Stanley [Wed, 12 Dec 2018 05:02:58 +0000 (15:32 +1030)] 
ci: Bump Qemu version

This moves the qemu version to qemu-powernv-for-skiboot-7 which is based
on upstream's 3.1.0, and supports a Power9 machine.

It also includes a fix for the skiboot XSCOM errors:

 XSCOM: read error gcid=0x0 pcb_addr=0x1020013 stat=0x0

There is no modelling of the xscom behaviour but the reads/writes
now succeed which is enough for skiboot to not error out.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agotest: Update qemu arguments to use bmc simulator
Joel Stanley [Wed, 12 Dec 2018 05:02:57 +0000 (15:32 +1030)] 
test: Update qemu arguments to use bmc simulator

THe qemu skiboot platform as of 8340a9642bba ("plat/qemu: use the common
OpenPOWER routines to initialize") uses the common aspeed BMC setup
routines. This means a BT interface is always set up, and if the
corresponding Qemu model is not present the timeout is 30 seconds.

It looks like this every time an IPMI message is sent:

 BT: seq 0x9e netfn 0x06 cmd 0x31: Maximum queue length exceeded
 BT: seq 0x9d netfn 0x06 cmd 0x31: Removed from queue
 BT: seq 0x9f netfn 0x06 cmd 0x31: Maximum queue length exceeded
 BT: seq 0x9e netfn 0x06 cmd 0x31: Removed from queue
 BT: seq 0xa0 netfn 0x06 cmd 0x31: Maximum queue length exceeded
 BT: seq 0x9f netfn 0x06 cmd 0x31: Removed from queue

Avoid this by adding the bmc simulator model to the Qemu powernv
machine.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Add opal-utils to Debian unstable
Joel Stanley [Wed, 12 Dec 2018 05:02:56 +0000 (15:32 +1030)] 
ci: Add opal-utils to Debian unstable

This puts a 'pflash' in the users PATH, allowing more test coverage of
ffspart.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Drop P8 mambo from Debian unstable
Joel Stanley [Wed, 12 Dec 2018 05:02:55 +0000 (15:32 +1030)] 
ci: Drop P8 mambo from Debian unstable

Debian Unstable has removed OpenSSL 1.0.0 from the repository so mambo
no longer runs:

  /opt/ibm/systemsim-p8/bin/systemsim-pegasus: error while loading shared
  libraries: libcrypto.so.1.0.0: cannot open shared object file: No such
  file or directory

By removing it from the container these tests will be automatically
skipped.

Tracked in https://github.com/open-power/op-build/issues/2519

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Add dtc dependencies for rawhide
Joel Stanley [Wed, 12 Dec 2018 05:02:54 +0000 (15:32 +1030)] 
ci: Add dtc dependencies for rawhide

Both F28 and Rawhide build their own dtc version. Rawhide was missing
the required build deps.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Update Debian unstable packages
Joel Stanley [Wed, 12 Dec 2018 05:02:53 +0000 (15:32 +1030)] 
ci: Update Debian unstable packages

This syncs Debian unstable with Ubuntu 18.04 in order to get the clang
package. It also adds qemu to the Debian install, which makes sense
Debian also has 2.12.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Use Ubuntu latest config for Debian unstable
Joel Stanley [Wed, 12 Dec 2018 05:02:52 +0000 (15:32 +1030)] 
ci: Use Ubuntu latest config for Debian unstable

Debian unstable has the same GCOV issue with 8.2 as Ubuntu latest so it
makes sense to share configurations there.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Disable GCOV builds in ubuntu-latest
Joel Stanley [Wed, 12 Dec 2018 05:02:51 +0000 (15:32 +1030)] 
ci: Disable GCOV builds in ubuntu-latest

They are known to be broken with GCC 8.2:

 https://github.com/open-power/skiboot/issues/206

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoci: Update gcov comment in Fedora 28
Joel Stanley [Wed, 12 Dec 2018 05:02:50 +0000 (15:32 +1030)] 
ci: Update gcov comment in Fedora 28

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 days agoplat/qemu: fix platform initialization when the BT device is not present
Cédric Le Goater [Wed, 12 Dec 2018 08:25:26 +0000 (09:25 +0100)] 
plat/qemu: fix platform initialization when the BT device is not present

A QEMU PowerNV machine does not necessarily have a BT device. It needs
to be defined on the command line with :

  -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10

When the QEMU platform is initialized by skiboot, we need to check
that such a device is present and if not, skip the AST initialization.

Fixes: 8340a9642bba ("plat/qemu: use the common OpenPOWER routines to initialize")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 days agoi2c: Fix i2c request hang during opal init if timers are not checked
Frederic Barrat [Thu, 29 Nov 2018 18:17:17 +0000 (19:17 +0100)] 
i2c: Fix i2c request hang during opal init if timers are not checked

If an i2c request cannot go through the first time, because the bus is
found in error and need a reset or it's locked by the OCC for example,
the underlying i2c implementation is using timers to manage the
request. However during opal init, opal pollers may not be called, it
depends in the context in which the i2c request is made. If the
pollers are not called, the timers are not checked and we can end up
with an i2c request which will not move foward and skiboot hangs.

Fix it by explicitly checking the timers if we are waiting for an i2c
request to complete and it seems to be taking a while.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 days agoopal-prd: hservice: Enable hservice->wakeup() in BMC
Shilpasri G Bhat [Tue, 13 Nov 2018 05:28:54 +0000 (10:58 +0530)] 
opal-prd: hservice: Enable hservice->wakeup() in BMC

This patch enables HBRT to use HYP special wakeup register in openBMC
which until now was only used in FSP based machines.

This patch also adds a capability check for opal-prd so that HBRT can
decide if the host special wakeup register can be used.

Fixes: 49999302251b("opal-prd: Add support for runtime OCC reset in ZZ")
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 days agoffspart: Support flashing already ECC protected images
Stewart Smith [Mon, 10 Dec 2018 06:48:49 +0000 (17:48 +1100)] 
ffspart: Support flashing already ECC protected images

We do this by assuming filenames with '.ecc' in them are already ECC
protected.

This solves a practical problem in transitioning op-build to use ffspart
for pnor assembly rather than three perl scripts and a lot of XML.

We also update the ffspart tests to take into account ECC requirements.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 days agoffspart: Increase MAX_LINE to above PATH_MAX
Stewart Smith [Mon, 3 Dec 2018 00:19:36 +0000 (11:19 +1100)] 
ffspart: Increase MAX_LINE to above PATH_MAX

Otherwise we saw failures in CI and the ~221 character paths Jankins
likes to have.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 days agolibflash/file: greatly increase perf of file_erase()
Stewart Smith [Mon, 3 Dec 2018 00:05:42 +0000 (11:05 +1100)] 
libflash/file: greatly increase perf of file_erase()

Do 4096 byte chunks not 8 byte chunks. A ffspart invocation constructing
a 64MB PNOR goes from a couple of seconds to ~0.1seconds with this
patch.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 days agoRevert "npu2: Allow ATSD for LPAR other than 0"
Stewart Smith [Wed, 12 Dec 2018 03:49:23 +0000 (14:49 +1100)] 
Revert "npu2: Allow ATSD for LPAR other than 0"

This reverts commit d8b161f4b361f70a7bb43be47d4a32b8f937287a.
As discussed on list, a bit premature to merge, removing for now.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agohw/bt.c: Move some debug ifdef to make static analysis happy
Stewart Smith [Thu, 29 Nov 2018 04:28:33 +0000 (15:28 +1100)] 
hw/bt.c: Move some debug ifdef to make static analysis happy

Okay, so maybe the static analysis warning is all useless, and maybe
having the ifdef around a call is actually useful. I'll take the less
noise in my CI static analysis thing.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agohdata/iohub.c: remove condition that was always true
Stewart Smith [Thu, 29 Nov 2018 04:28:32 +0000 (15:28 +1100)] 
hdata/iohub.c: remove condition that was always true

Caught by static analysis. The previous if() condition was ensuring lxr
was not null, so we don't need this additional check.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agovpd: Force static analysis to not think about NULL term strings
Stewart Smith [Thu, 29 Nov 2018 04:28:31 +0000 (15:28 +1100)] 
vpd: Force static analysis to not think about NULL term strings

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agoopal_sync_host_reboot: clarify when we return OPAL_BUSY_EVENT
Stewart Smith [Thu, 29 Nov 2018 04:28:30 +0000 (15:28 +1100)] 
opal_sync_host_reboot: clarify when we return OPAL_BUSY_EVENT

Basically to shut up static analysis of using a boolean in a non-boolean
context (bitwise).

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agoopal_trace_entry: Move ifdef around to shut up static analysis
Stewart Smith [Thu, 29 Nov 2018 04:28:29 +0000 (15:28 +1100)] 
opal_trace_entry: Move ifdef around to shut up static analysis

Again, this makes things look slightly different so I don't keep seeing
the static analysis warning.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agomem_region.c: Move ifdef for MEM_POISON to shut up static analysis
Stewart Smith [Thu, 29 Nov 2018 04:28:28 +0000 (15:28 +1100)] 
mem_region.c: Move ifdef for MEM_POISON to shut up static analysis

The static analysis tool is arguably wrong and should go away.

But... I'm sick of keeping coming back to it and reviewing the false
positives enough to make a slight change to where ifdefs are.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agoChange ifdef around dump_fdt() to shut up static analysis
Stewart Smith [Thu, 29 Nov 2018 04:28:27 +0000 (15:28 +1100)] 
Change ifdef around dump_fdt() to shut up static analysis

This is a dumb warning from a certain static analysis tool that a
function has no effect when the ifdef that would make it have an effect
isn't defined and we replace it with a no-op impl.

Putting the #ifdef around the call just so I don't have to discount this
damn static analysis false positive every time I go and look at the
results.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agocore/cpu.c: avoid container_of(NULL) in next_cpu()
Stewart Smith [Thu, 29 Nov 2018 04:28:26 +0000 (15:28 +1100)] 
core/cpu.c: avoid container_of(NULL) in next_cpu()

A certain finicky static analysis tool did point out that we were
operating on a value that could be null (and since first_cpu() calls
next_cpu(NULL) to get the first one, it also gets to be complained about
as next_cpu() could act on that NULL pointer).

So, rework things to shut the static analysis tool up, when in fact this
was never a problem.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agoAdd purging CPU L2 and L3 caches into NPU hreset.
Rashmica Gupta [Wed, 5 Dec 2018 02:39:28 +0000 (13:39 +1100)] 
Add purging CPU L2 and L3 caches into NPU hreset.

If a GPU is passed through to a guest and the guest unexpectedly terminates,
there can be cache lines in CPUs that belong to the GPU. So purge the caches
as part of the reset sequence. L1 is write through, so doesn't need to be purged.

The sequence to purge the L2 and L3 caches from the hw team:

"L2 purge:
 (1) initiate purge
 putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TYPE L2CAC_FLUSH -all
 putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER ON -all

 (2) check this is off in all caches to know purge completed
 getspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_REG_BUSY -all

 (3) putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER OFF -all

L3 purge:
 1) Start the purge:
 putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_TTYPE FULL_PURGE -all
 putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ ON -all

 2) Ensure that the purge has completed by checking the status bit:
 getspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ -all

 You should see it say OFF if it's done:
 p9n.ex k0:n0:s0:p00:c0
 EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ
 OFF"

Suggested-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agoplat/qemu: use the common OpenPOWER routines to initialize
Cédric Le Goater [Mon, 3 Dec 2018 12:38:31 +0000 (13:38 +0100)] 
plat/qemu: use the common OpenPOWER routines to initialize

Back in 2016, we did not have a large support of the PowerNV devices
under QEMU and we were using our own custom ones. This has changed and
we can now use all the common init routines of the OpenPOWER
platforms.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agonpu2: Allow ATSD for LPAR other than 0
Alexey Kardashevskiy [Wed, 5 Dec 2018 04:52:22 +0000 (15:52 +1100)] 
npu2: Allow ATSD for LPAR other than 0

Each XTS MMIO ATSD# register is accompanied by another register -
XTS MMIO ATSD0 LPARID# - which controls LPID filtering for ATSD
transactions.

When a host system passes a GPU through to a guest, we need to enable
some ATSD for an LPAR. At the moment the host assigns one ATSD to
a NVLink bridge and this maps it to an LPAR when GPU is assigned to
the LPAR. The link number is used for an ATSD index.

ATSD6&7 stay mapped to the host (LPAR=0) all the time which seems to be
acceptable price for the simplicity.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agonpu2: Return sensible PCI error when not frozen
Alexey Kardashevskiy [Wed, 5 Dec 2018 23:50:13 +0000 (10:50 +1100)] 
npu2: Return sensible PCI error when not frozen

The current kernel calls OPAL_PCI_EEH_FREEZE_STATUS with an uninitialized
@pci_error_type parameter and then analyzes it even if the OPAL call
returned OPAL_SUCCESS. This is results in unexpected EEH events and NPU
freezes.

This initializes @pci_error_type and @severity to known safe values.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 days agonpu2: Advertise correct TCE page size
Alexey Kardashevskiy [Thu, 6 Dec 2018 08:29:10 +0000 (19:29 +1100)] 
npu2: Advertise correct TCE page size

The P9 NPU workbook says that only 4K/64K/16M/256M page size are supported
and in fact npu2_map_pe_dma_window() supports just these but in absence of
the "ibm,supported-tce-sizes" property Linux assumes the default P9 PHB4
page sizes - 4K/64K/2M/1G - so when Linux tries 2M/1G TCEs, we get lots of
"Unexpected TCE size" from npu2_tce_kill().

This advertises TCE page sizes so Linux could handle it correctly, i.e.
fall back to 4K/64K TCEs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agoskiboot v6.2-rc2 release notes v6.2-rc2
Stewart Smith [Thu, 29 Nov 2018 04:18:16 +0000 (15:18 +1100)] 
skiboot v6.2-rc2 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agotravis: Coverity fixed their SSL cert
Joel Stanley [Thu, 29 Nov 2018 00:10:55 +0000 (10:40 +1030)] 
travis: Coverity fixed their SSL cert

_   _                            _ _           _   _
( ) ( )  ___  ___  ___ _   _ _ __(_) |_ _   _  ( ) ( )
 \|  \| / __|/ _ \/ __| | | | '__| | __| | | | |/  |/
        \__ \  __/ (__| |_| | |  | | |_| |_| |
        |___/\___|\___|\__,_|_|  |_|\__|\__, |
                                        |___/

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agoopal-ci: Use ubuntu:rolling for Ubuntu latest image
Joel Stanley [Thu, 29 Nov 2018 00:10:54 +0000 (10:40 +1030)] 
opal-ci: Use ubuntu:rolling for Ubuntu latest image

This updates the Ubuntu 'latest' to use ubuntu:rolling, which is
the most recent release. It turns out that ubuntu:latest is actually the
latest LTS (18.04).

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agoffspart: Add test for eraseblock size
Joel Stanley [Thu, 29 Nov 2018 00:01:54 +0000 (10:31 +1030)] 
ffspart: Add test for eraseblock size

This test checks that the partitions are correctly laid out when the
eraseblock size is greater than the start of the first partition.
Currently ffspart fails to create a valid image in this case.

There are two tests. The second is expected to fail but it is marked as
passing for now.

This test requires pflash to work. Currently we leave that as an
exercise for the user.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agoffspart: Add toc test
Joel Stanley [Thu, 29 Nov 2018 00:01:53 +0000 (10:31 +1030)] 
ffspart: Add toc test

This test specifies a toc in the configuration file.

There are no tests or documentation for the toc syntax, so this exists
to describe how specify a toc.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agonpu2-opencapi: Log ODL endpoint information register
Frederic Barrat [Fri, 23 Nov 2018 08:54:39 +0000 (09:54 +0100)] 
npu2-opencapi: Log ODL endpoint information register

If the link trains in degraded mode, log the ODL endpoint information
register for debug. Its content is specific to the DLx and TLx
implementation, so this is really information useful for the hardware
team.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agonpu2-opencapi: Detect if link trained in degraded mode
Frederic Barrat [Fri, 23 Nov 2018 08:54:38 +0000 (09:54 +0100)] 
npu2-opencapi: Detect if link trained in degraded mode

There's no status readily available to tell the effective link
width. Instead, we have to look at the individual status of each lane,
on the transmit and receive direction. All relevant information is in
the ODL status register.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agonpu2-opencapi: Log extra information on link training failure
Frederic Barrat [Fri, 23 Nov 2018 08:54:37 +0000 (09:54 +0100)] 
npu2-opencapi: Log extra information on link training failure

Log the link training status register in case of failure to train.
It can have useful information for the hardware team.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agoDon't warn on "long" OPAL_RESYNC_TIMEBASE calls
Stewart Smith [Tue, 27 Nov 2018 21:13:23 +0000 (08:13 +1100)] 
Don't warn on "long" OPAL_RESYNC_TIMEBASE calls

On P8 this is called when we exit fastsleep, and we shouldn't measure
the "time" spent in the call for what (in retrospect) is an obvious
reason.

Fixes: 50ea35c2d07874755c03e6ae2bdf7a33ad2c768a
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agolibpore: Sync p8 files, remove erroneous "IBM Confidential"
Stewart Smith [Mon, 26 Nov 2018 23:25:02 +0000 (10:25 +1100)] 
libpore: Sync p8 files, remove erroneous "IBM Confidential"

We also had some rogue "IBM Confidential" strings that we failed to
remove with the original change of Copyright headers for open sourcing.
Do this by synchronising with the hostboot copy of the code, which
removed the Confidential string when their copyright headers changed for
initial open sourcing of the code back in 2014. See hostboot commit
3bcf5b7982bb8a2d9227dbff7be4ff2ce5fec05c where the HWP copyright headers
were updated.

We likely missed this as we did a similar process inside the skiboot
repository, but likely only on the (C) headers themselves.

The libpore changes that we were missing *look* minor, but we need to
throw some testing at them at least, as there *are* changes that we were
missing.

We also have to make a minor modification (being sent upstream) to avoid
a compiler warning of always false comparison (<0 on unsigned int)

Reported-by: Dawn Sylvia <ddzubak@us.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 weeks agoskiboot v6.0.14 release notes
Stewart Smith [Mon, 26 Nov 2018 22:55:06 +0000 (09:55 +1100)] 
skiboot v6.0.14 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit f4afd85a84ab090ddda7aea18c5153755777f103)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agohdata/test: workaround dtc bugs
Stewart Smith [Mon, 26 Nov 2018 08:08:24 +0000 (19:08 +1100)] 
hdata/test: workaround dtc bugs

In dtc v1.4.5 to at least v1.4.7 there have been a few bugs introduced
that change the layout of what's produced in the dts. In order to be
immune from them, we should use the (provided) dtdiff utility, but we
also need to run the dts we're diffing against through a dtb cycle in
order to ensure we get the same format as what the hdat_to_dt to dts
conversion will.

This fixes a bunch of unit test failures on the version of dtc shipped
with recent Linux distros such as Fedora 29.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoplatform/firenze: Fix branch-to-null crash
Oliver O'Halloran [Mon, 26 Nov 2018 02:00:34 +0000 (13:00 +1100)] 
platform/firenze: Fix branch-to-null crash

When the bus alloc and free methods were removed we missed a case in the
Firenze platform slot code that relied on the the bus-specific method to
the bus pointer in the request structure. This results in a
branch-to-null during boot and a crash. This patch fixes it by
initialising it manually here.

Fixes: 801462feb7d6 ("core/i2c: Remove bus specific alloc and free callbacks")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agolibflash: Don't merge ECC-protected ranges
Samuel Mendoza-Jonas [Wed, 21 Nov 2018 06:16:01 +0000 (17:16 +1100)] 
libflash: Don't merge ECC-protected ranges

Libflash currently merges contiguous ECC-protected ranges, but doesn't
check that the ECC bytes at the end of the first and start of the second
range actually match sanely. More importantly, if blocklevel_read() is
called with a position at the start of a partition that is contained
somewhere within a region that has been merged it will update the
position assuming ECC wasn't being accounted for. This results in the
position being somewhere well after the actual start of the partition
which is incorrect.

For now, remove the code merging ranges. This means more ranges must be
held and checked however it prevents incorrectly reading ECC-correct
regions like below:

[  174.334119453,7] FLASH: CAPP partition has ECC
[  174.437349574,3] ECC: uncorrectable error: ffffffffffffffff ff
[  174.437426306,3] FLASH: failed to read the first 0x1000 from CAPP partition, rc 14
[  174.439919343,3] CAPP: Error loading ucode lid. index=201d1

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agolibflash: Restore blocklevel tests
Samuel Mendoza-Jonas [Wed, 21 Nov 2018 06:16:00 +0000 (17:16 +1100)] 
libflash: Restore blocklevel tests

This fell out in f58be46 "libflash/test: Rewrite Makefile.check to
improve scalability". Add it back in as test-blocklevel.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
3 weeks agoWarn on long OPAL calls
Stewart Smith [Mon, 19 Nov 2018 04:17:05 +0000 (15:17 +1100)] 
Warn on long OPAL calls

Measure entry/exit time for OPAL calls and warn appropriately if the
calls take too long (>100ms gets us a DEBUG log, > 1000ms gets us a
warning).

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 weeks agoskiboot v6.2-rc1 release notes v6.2-rc1
Stewart Smith [Mon, 19 Nov 2018 06:27:18 +0000 (17:27 +1100)] 
skiboot v6.2-rc1 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 weeks agoipmi: Reduce ipmi_queue_msg_sync() polling loop time to 10ms
Stewart Smith [Mon, 19 Nov 2018 00:44:42 +0000 (11:44 +1100)] 
ipmi: Reduce ipmi_queue_msg_sync() polling loop time to 10ms

On a plain boot, this reduces the time spent in OPAL by ~170ms on
p9dsu. This is due to hiomap (currently) using synchronous IPMI
messages.

It will also *significantly* reduce latency on runtime flash
operations, as we'll spend typically 10-20ms in OPAL rather than
100-200ms. It's not an ideal solution to that, but it's a quick
and obvious win for jitter.

Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 weeks agoplatform/witherspoon: Fix opencapi lane-mask used on GPU0
Frederic Barrat [Wed, 14 Nov 2018 17:02:49 +0000 (18:02 +0100)] 
platform/witherspoon: Fix opencapi lane-mask used on GPU0

When an opencapi device is used via the Acorn adapter, the link used
is connected to the "middle" group of lanes of the obus. We were using
the wrong set of lanes. The link was somehow still training, likely
because the default settings at power-on were good enough, but it's
still wrong.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 weeks agoplatform/witherspoon: Avoid harmless error message
Frederic Barrat [Wed, 14 Nov 2018 17:02:48 +0000 (18:02 +0100)] 
platform/witherspoon: Avoid harmless error message

The I2C read to find out if a device on the GPU slot is an opencapi
adapter or nvidia card is reporting an "arbitration loss" error if no
device is connected on the GPU slot. That I2C read is actually useless
if we already know there's no device connected, so let's skip it. It
will avoid logging an harmless error.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 weeks agoAdd the other 7 ATSD registers to the device tree.
Rashmica Gupta [Fri, 16 Nov 2018 03:04:21 +0000 (14:04 +1100)] 
Add the other 7 ATSD registers to the device tree.

Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
4 weeks agoskiboot v6.0.13 release notes
Stewart Smith [Wed, 14 Nov 2018 07:42:19 +0000 (18:42 +1100)] 
skiboot v6.0.13 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit e550528a74af7e632c359cd29e4ba295743bdb84)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agohiomap: quieten warning on failing to move a window
Stewart Smith [Thu, 8 Nov 2018 07:43:47 +0000 (18:43 +1100)] 
hiomap: quieten warning on failing to move a window

This isn't *necessarily* an error that we should complain loudly about.
If, for example, the BMC enforces the Read Only flag on a FFS partition,
opening a write window *should* fail, and we do indeed test this in
op-test.

Thus we deal with the error in a well known path: returning an error
code and then it's eventually a userspace problem.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agogcov: link in ctors* as newer GCC doesn't group them all
Stewart Smith [Thu, 8 Nov 2018 06:28:14 +0000 (17:28 +1100)] 
gcov: link in ctors* as newer GCC doesn't group them all

It seems that newer toolchains get us multiple ctors sections to link in
rather than just one. If we discard them (as we were doing), then we
don't have a working gcov build (and we get the "doesn't look sane"
warning on boot).

So, include ctors* and all is well.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agolibstb: Pass a tpm_dev to tpm_i2c_request_send()
Oliver O'Halloran [Wed, 7 Nov 2018 02:24:48 +0000 (13:24 +1100)] 
libstb: Pass a tpm_dev to tpm_i2c_request_send()

Just pass the container structure rather than bus_id and xscom_base to
tpm_i2c_request_send(). Rename xscom_base to i2c_addr while we're here
since that's just plain wrong.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agolibflash/ipmi-hiomap: Respect daemon presence and flash control
Andrew Jeffery [Fri, 9 Nov 2018 00:33:15 +0000 (11:03 +1030)] 
libflash/ipmi-hiomap: Respect daemon presence and flash control

Fix the fix of ORing in the BMC state - we only want to retain state
covered by the ack mask as this is something we still need to handle.
Critically, we must not retain state not covered by the ack mask as this
may lead to host firmware attempting to communicate with a dead daemon
or attempting to access the PNOR whilst the daemon is not in control of
the flash.

Further, add unit tests to capture the desired (and now implemented)
behaviour.

Fixes: 34cffed2ccf3 ("libflash/ipmi-hiomap: Improve event handling")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agolibflash/ipmi-hiomap: Add support for unit tests
Andrew Jeffery [Fri, 9 Nov 2018 00:33:14 +0000 (11:03 +1030)] 
libflash/ipmi-hiomap: Add support for unit tests

Lay the ground work for unit testing the ipmi-hiomap implementation. The
design hooks a subset of the IPMI interface to move through a
data-driven "scenario" of IPMI message exchanges. Two basic tests are
added exercising the initialsation path of the protocol implementation.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agolibflash/ipmi-hiomap: Fix argument type warning on x86-64
Andrew Jeffery [Fri, 9 Nov 2018 00:33:13 +0000 (11:03 +1030)] 
libflash/ipmi-hiomap: Fix argument type warning on x86-64

libflash/ipmi-hiomap.c: In function ‘hiomap_window_move’:
libflash/ipmi-hiomap.c:17:21: error: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘uint64_t’ {aka ‘long unsigned int’} [-Werror=format=]
 #define pr_fmt(fmt) "HIOMAP: " fmt
                     ^~~~~~~~~~
include/skiboot.h:93:41: note: in expansion of macro ‘pr_fmt’
 #define prlog(l, f, ...) do { _prlog(l, pr_fmt(f), ##__VA_ARGS__); } while(0)
                                         ^~~~~~
include/skiboot.h:94:30: note: in expansion of macro ‘prlog’
 #define prerror(fmt...) do { prlog(PR_ERR, fmt); } while(0)
                              ^~~~~
libflash/ipmi-hiomap.c:291:3: note: in expansion of macro ‘prerror’
   prerror("Invalid window properties: len: %llu, size: %llu\n",
   ^~~~~~~
libflash/ipmi-hiomap.c:291:47: note: format string is defined here
   prerror("Invalid window properties: len: %llu, size: %llu\n",
                                            ~~~^
                                            %lu

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agolibflash/test: Rewrite Makefile.check to improve scalability
Andrew Jeffery [Fri, 9 Nov 2018 00:33:12 +0000 (11:03 +1030)] 
libflash/test: Rewrite Makefile.check to improve scalability

The current implementation makes it hard to expand the list of tests if
we want to build anything that doesn't link to mbox-server. This is a
consequence of embedding the $(LIBFLASH_TEST_EXTRA) variable inside the
recipes for building test executables, which makes the makefile a bit of
a maze to navigate.

To address this we could go the route of duplicating the
$(LIBFLASH_TEST), $(LIBFLASH_TEST_EXTRA) and the corresponding make
directives (targets/prerequisites/recipes) each time we want to link a
binary against a new set of objects, but that seems ham-fisted.

Further, $(LIBFLASH_TEST_EXTRA) is defined in terms of the relevant
object (.o) files, but the recipes it is used in otherwise use source
(.c) paths for compilation. These other paths are typically to non-test
code that needs to be compiled into the test executable, but we can't
use object files at the usual path because we will typically have a
conflict of architectures (PPC64 for the skiboot object, x86_64 for the
test object). This in turn means that we will compile source files
multiple times (once for each test binary it is required in) rather than
re-using an existing object file.

Further, the current structure of the Makefile requires we #include the
.c file under test directly into the test source if we want it in a
specific test case due to the relationship of the prerequisites to the
build (only the first source prerequisite is included in the build). The
include-the-c-file approach can have some annoying side-effects with
respect to macros, typically errors regarding redefinition. While it is
useful for testing static functions in the source under test, it would
be nice if this approach was optional rather than required.

This change attempts to address all of these issues. The outcome is we
have precise control of which objects get linked into each test binary,
we avoid the architecture clash problem, we re-use existing compiled
objects (avoiding recompilation), and we make the include-the-c-file
approach optional.

The general approach is to generate a new directory hierarchy of object
files under a `$(HOSTCC) -dumpmachine` directory in the repository root
and use these for linking the test cases. Objects that land in this
segregated tree are described by a _SOURCES variable for each test,
similar in structure and behaviour to automake's _SOURCES variables.
Again similar to automake, a check_PROGRAMS variable is used that
describes the path of each test binary to be built.

The test binary paths are mapped to the corresponding _SOURCES variable
by some secondary-evaluation wizardry that no-one has to pay any
attention to once it is written. Whilst the implementation is perhaps
slightly tricky, it allows us to avoid the recipe headache of
unconditionally linking in objects defined in variables that don't
directly participate in the target's prerequisites, and so prevents the
explosion of variables as we implement tests that require disjoint sets
of dependencies.

This is initially intended as an isolated experiment with the libflash
test makefile, but it's feasible that the scope of the concept could be
expanded to other test Makefiles.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
5 weeks agophb4: Update & cleanup register definitions
Benjamin Herrenschmidt [Mon, 5 Nov 2018 05:50:22 +0000 (16:50 +1100)] 
phb4: Update & cleanup register definitions

We had a bunch of remaining definitions for registers that
don't actually exist in PHB4 anymore (copied from PHB3).

This removes them along with a handful of minor style cleanups

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agoskiboot 6.0.11 release notes
Stewart Smith [Fri, 2 Nov 2018 07:50:05 +0000 (18:50 +1100)] 
skiboot 6.0.11 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 3e2024d903ee27ad77da01f454bb2404627ba5dc)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agoCI: Bump the Qemu we build for CI testing
Stewart Smith [Thu, 1 Nov 2018 07:37:12 +0000 (18:37 +1100)] 
CI: Bump the Qemu we build for CI testing

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agophb4/capp: Only reset FIR bits that cause capp machine check
Vaibhav Jain [Thu, 1 Nov 2018 05:35:15 +0000 (11:05 +0530)] 
phb4/capp: Only reset FIR bits that cause capp machine check

During CAPP recovery do_capp_recovery_scoms() will reset the CAPP Fir
register just after CAPP recovery is completed. This has an
unintentional side effect of preventing PRD from analyzing and
reporting this error. If PRD tries to read the CAPP FIR after opal has
already reset it, then it logs a critical error complaining "No active
error bits found".

To prevent this from happening we update do_capp_recovery_scoms() to
only reset fir bits that cause CAPP machine check (local xstop). This
is done by reading the CAPP Fir Action0/1 & Mask registers and
generating a mask which is then written on CAPP_FIR_CLEAR register.

Cc: stable
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agophb4: Check for RX errors after link training
Oliver O'Halloran [Tue, 30 Oct 2018 00:02:30 +0000 (11:02 +1100)] 
phb4: Check for RX errors after link training

Some PHB4 PHYs can get stuck in a bad state where they are constantly
retraining the link. This happens transparently to skiboot and Linux
but will causes PCIe to be slow. Resetting the PHB4 clears the
problem.

We can detect this case by looking at the RX errors count where we
check for link stability. This patch does this by modifying the link
optimal code to check for RX errors. If errors are occurring we
retrain the link irrespective of the chip rev or card.

Normally when this problem occurs, the RX error count is maxed out at
255. When there is no problem, the count is 0. We chose 8 as the max
rx errors value to give us some margin for a few errors. There is also
a knob that can be used to set the error threshold for when we should
retrain the link. ie

  nvram -p ibm,skiboot --update-config phb-rx-err-max=8

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agocore/flash: Log return code when ffs_init() fails
Andrew Jeffery [Thu, 1 Nov 2018 12:47:27 +0000 (23:17 +1030)] 
core/flash: Log return code when ffs_init() fails

Knowing the return code is at least better than not knowing the return
code.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agolibflash/ipmi-hiomap: Use error codes rather than abort()
Andrew Jeffery [Thu, 1 Nov 2018 12:47:26 +0000 (23:17 +1030)] 
libflash/ipmi-hiomap: Use error codes rather than abort()

Admittedly the situations are pretty dire, and usually indicate a
programming failure on the BMC's part, but abort() seems a bit over the
top. The technique was useful for development but shouldn't have made it
into production.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agolibflash/ipmi-hiomap: Restore window state on window/protocol reset
Andrew Jeffery [Thu, 1 Nov 2018 12:47:25 +0000 (23:17 +1030)] 
libflash/ipmi-hiomap: Restore window state on window/protocol reset

The initial implementation of ipmi-hiomap left a bit to be desired when
it came to event handling: it didn't completely restore the state of the
system to what it was before events like a hiomap protocol or window
reset take place. The result is the host cannot recover from e.g. the
BMC being rebooted underneath it.

Take the only step required in the event of window reset, or the final
step after performing the handshake in the event of a protocol reset,
and re-open the previously active window if there was one.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agolibflash/ipmi-hiomap: Improve event handling
Andrew Jeffery [Thu, 1 Nov 2018 12:47:24 +0000 (23:17 +1030)] 
libflash/ipmi-hiomap: Improve event handling

The host firmware side of the hiomap protocol has two input sources:

1. Requests to adjust the flash mappings from itself or the kernel
2. State change events received from the BMC

The handling of BMC state change events (2.) is asynchronous in two ways:

a. The BMC pushes the state change event to the host, which is recorded
   but not acted on
b. When handling requests to adjust the flash mapping, skiboot first
   addresses any new BMC state changes before servicing the mapping
   request

Further, the hiomap protocol sends a mix of ackable and stateful events,
where ackable events are only relevant until skiboot's hiomap event
handler (b. above) cleans them up, whereas stateful events persist until
the BMC provides a subsequent state change event.

As we handle the ackable events asynchronous to receiving notification
(b. vs a. above), OR in the received event state rather than directly
assign to ensure we don't lose events that we must not miss. As an
example, without the OR we may lose ackable events if the daemon
restarts and pushes a new state change event during initialisation,
which will necessarily bear no relation to the previous state change
event value.

Similarly, don't close active windows in a. based on the event content,
as we need the window type information to handle state restoration in b.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agop9dsu: Describe platform BMC register configuration
Andrew Jeffery [Wed, 31 Oct 2018 05:24:13 +0000 (15:54 +1030)] 
p9dsu: Describe platform BMC register configuration

Provide the p9dsu-specific BMC configuration values required for the
host kernel to drive the VGA display correctly.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agop9dsu: Add HIOMAP-over-IPMI support
Andrew Jeffery [Wed, 31 Oct 2018 05:24:12 +0000 (15:54 +1030)] 
p9dsu: Add HIOMAP-over-IPMI support

Boston uses the same netfn / command values as OpenBMC.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agolibflash/ipmi-hiomap: Cleanup allocation on init failure
Andrew Jeffery [Wed, 31 Oct 2018 05:24:11 +0000 (15:54 +1030)] 
libflash/ipmi-hiomap: Cleanup allocation on init failure

Previously we were leaking the memory pointed by ctx if an IPMI error
occurred during protocol initialisation. Make sure we free the memory if
an error occurs.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agonx: Don't abort on missing NX when using a QEMU machine
Benjamin Herrenschmidt [Thu, 1 Nov 2018 04:12:50 +0000 (15:12 +1100)] 
nx: Don't abort on missing NX when using a QEMU machine

These don't have an NX node (and probably never will) as they
don't provide any coprocessor. However, the DARN instruction
works so this abort is unnecessary.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agoskiboot v6.0.10 release notes
Stewart Smith [Wed, 31 Oct 2018 05:42:19 +0000 (16:42 +1100)] 
skiboot v6.0.10 release notes

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit b93b22df1a8b8ace4ffc080b28877fde7eaa3dde)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agoRun pollers in time_wait() when not booting
Stewart Smith [Tue, 30 Oct 2018 05:34:23 +0000 (00:34 -0500)] 
Run pollers in time_wait() when not booting

This only bit us hard with hiomap in one scenario.

Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward
progress on ongoing operations, and the internal to skiboot API has been
that time_wait() of a suitable time will run pollers (on at least one
CPU) to help ensure forward progress can be made.

In a perfect world, interrupts are used but they may a) be disabled, or
b) the thing we're doing can't use interrupts because computers are
generally terrible.

Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run
pollers only on the boot CPU, and not if we held any locks. This was to
reduce the chance of programming code that could deadlock, as well as to
ensure that we didn't just thrash all the cachelines for running pollers
all over a large system during boot, or hard spin on the same locks on
all secondary CPUs.

The problem arises if the OS we're booting makes an OPAL call early on,
with interrupts disabled, that requires a poller to run to make forward
progress. An example of this would be OPAL_WRITE_NVRAM early in Linux
boot (where Linux sets up the partitions it wants) - something that
occurs iff we've had to reformat NVRAM this boot (i.e. first boot or
corrupted NVRAM).

The hiomap implementation should arguably *not* rely on synchronous IPMI
messages, but this is a future improvement (as was for mbox before it).
The mbox-flash code solved this problem by spinning on check_timers().

More generically though, the approach of running the pollers when no
longer booting means we behave more in line with what the API is meant
to be, rather than have this odd case of "time_wait() for a condition
that could also be tripped by an interrupt works fine unless the OS is
up and running but hasn't set interrupts up yet".

Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
Fixes: 3db397ea5892a8b348cf412739996731884561b3
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
6 weeks agohiomap: fix missing newline at end of 'Flushing writes' prlog()
Stewart Smith [Tue, 30 Oct 2018 05:32:57 +0000 (00:32 -0500)] 
hiomap: fix missing newline at end of 'Flushing writes' prlog()

Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agotravis/ci: rework Dockerfiles to produce build artifacts
Stewart Smith [Sun, 28 Oct 2018 23:56:21 +0000 (10:56 +1100)] 
travis/ci: rework Dockerfiles to produce build artifacts

ubuntu-latest was also missing clang, as ubuntu-latest is closer to
ubuntu 18.04 than 16.04

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agohiomap: free ipmi message in callback
Stewart Smith [Thu, 25 Oct 2018 08:15:58 +0000 (03:15 -0500)] 
hiomap: free ipmi message in callback

Otherwise we'd slowly leak memory on each hiomap operation.

Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0
Tested-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agoRevert "TEMPORARY HACK: Disable verifying VERSION"
Stewart Smith [Thu, 25 Oct 2018 00:26:03 +0000 (11:26 +1100)] 
Revert "TEMPORARY HACK: Disable verifying VERSION"

This reverts commit f835684365273c5ff1b7c700ddc0f9c1a859363f.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agoQuieten 'warnings' now that SIO is disabled
Stewart Smith [Wed, 24 Oct 2018 07:16:12 +0000 (02:16 -0500)] 
Quieten 'warnings' now that SIO is disabled

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agonpu2-opencapi: Enable presence detection on ZZ
Frederic Barrat [Mon, 15 Oct 2018 07:36:34 +0000 (09:36 +0200)] 
npu2-opencapi: Enable presence detection on ZZ

Presence detection for opencapi adapters was broken for ZZ planars v3
and below. All ZZ systems currently used in the lab have had their
planar upgraded, so we can now remove the override we had to force
presence and activate presence detection. Which should improve boot
time.

Considering the state of opal support on ZZ, this is really only for
lab usage on BML. The opencapi enablement team has okay'd the
change. In the unlikely case somebody tries opencapi on an old ZZ, the
presence detection through i2c will show that no adapter is present
and skiboot won't try to access or train the link.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agocpu: Quieten OS endian switch messages
Joel Stanley [Wed, 24 Oct 2018 00:07:30 +0000 (10:37 +1030)] 
cpu: Quieten OS endian switch messages

Users see these when loading an OS from Petitboot:

 [  119.486794100,5] OPAL: Switch to big-endian OS
 [  120.022302604,5] OPAL: Switch to little-endian OS

Which is expected and doesn't provide any information the user can act
on. Switch them to PR_INFO so they still appear in the log, but not on
the serial console.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agopflash: Add --skip option for reading
Adriana Kobylak [Tue, 23 Oct 2018 18:47:29 +0000 (13:47 -0500)] 
pflash: Add --skip option for reading

Add a --skip=N option to pflash to skip N number of bytes when reading.
This would allow users to print the VERSION partition without the STB
header by specifying the --skip=4096 argument, and it's a more generic
solution rather than making pflash depend on secure/trusted boot code.

Signed-off-by: Adriana Kobylak <anoo@linux.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
[stewart: fix up pflash test]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agoexternal/mambo: Check for qtrace_utils.tcl before sourcing it
Madhavan Srinivasan [Sat, 20 Oct 2018 14:49:30 +0000 (20:19 +0530)] 
external/mambo: Check for qtrace_utils.tcl before sourcing it

Commit cb835dbdf875 ('external/mambo: conditionally source qtrace script')
added qtrace_utils.tcl sourcing in skiboot.tcl without a check to see
whether it exists in the current directory. This broke running mambo from
another directory using skiboot.tcl. Patch adds a check.

Fixes: cb835dbdf875 ('external/mambo: conditionally source qtrace script')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agoplatforms/astbmc/vesnin: Send list of PCI devices to BMC through IPMI
Artem Senichev [Fri, 19 Oct 2018 11:37:51 +0000 (14:37 +0300)] 
platforms/astbmc/vesnin: Send list of PCI devices to BMC through IPMI

Implements sending a list of installed PCI devices through IPMI protocol.
Each PCI device description is sent as a standalone IPMI message.
A list of devices can be gathered from separate messages using the
session identifier. The session Id is an incremental counter that is
updated at the start of synchronization session.

Signed-off-by: Artem Senichev <a.senichev@yadro.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agolpc: Clear sync no-response field prior to device probe
Andrew Jeffery [Thu, 18 Oct 2018 07:56:12 +0000 (18:26 +1030)] 
lpc: Clear sync no-response field prior to device probe

Artem Senichev reported[1] his P8 platform was failing to boot from
a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the
following error:

[  110.097168975,3] PLAT: Failed to open PNOR flash controller

I reproduced this behaviour on a Palmetto; we need to ensure the state
of the no-response error bit is clear before proceding with the presence
test.

The fix appears to resolve the failure to open the PNOR flash controller
on Palmetto and doesn't change the expected behaviour on Witherspoon.

[1] https://github.com/open-power/skiboot/issues/197

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Tested-by: Artem Senichev <a.senichev@yadro.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agocore/device: NULL pointer dereference fix
Nicholas Piggin [Wed, 17 Oct 2018 14:45:33 +0000 (00:45 +1000)] 
core/device: NULL pointer dereference fix

This was caught with unmapped memory dereference page faults.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agocore/flash: NULL pointer dereference fixes
Nicholas Piggin [Wed, 17 Oct 2018 14:45:32 +0000 (00:45 +1000)] 
core/flash: NULL pointer dereference fixes

These were caught with unmapped memory dereference page faults.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agoSTOP API: Changes for SMF and SPR self save
Prem Shanker Jha [Tue, 16 Oct 2018 07:45:06 +0000 (13:15 +0530)] 
STOP API: Changes for SMF and SPR self save

    Commit accomplishes following:
        -   Implementation of new self restore region memory layout
        -   Restore of SPRs pertaining to SMF
        -   Self save of SPRs
        -   Backward compatibility with old self restore layout
Key_Cronus_Test=PM_REGRESS

Change-Id: I11359e392102d32896251225907eb95a43ba6f78
Reviewed-on: http://rchgit01.rchland.ibm.com/gerrit1/66212
Reviewed-by: RANGANATHPRASAD G. BRAHMASAMUDRA <prasadbgr@in.ibm.com>
Tested-by: Jenkins Server <pfd-jenkins+hostboot@us.ibm.com>
Tested-by: HWSV CI <hwsv-ci+hostboot@us.ibm.com>
Tested-by: Cronus HW CI <cronushw-ci+hostboot@us.ibm.com>
Tested-by: Hostboot CI <hostboot-ci+hostboot@us.ibm.com>
Reviewed-by: Gregory S. Still <stillgs@us.ibm.com>
Reviewed-by: Jennifer A. Stofer <stofer@us.ibm.com>
Reviewed-on: http://rchgit01.rchland.ibm.com/gerrit1/66216
Tested-by: Jenkins OP Build CI <op-jenkins+hostboot@us.ibm.com>
Tested-by: FSP CI Jenkins <fsp-CI-jenkins+hostboot@us.ibm.com>
Tested-by: Jenkins OP HW <op-hw-jenkins+hostboot@us.ibm.com>
Reviewed-by: Daniel M. Crowell <dcrowell@us.ibm.com>
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
7 weeks agoSCOM Restore: Handle case of old HB and new STOP API case.
Prem Shanker Jha [Tue, 16 Oct 2018 07:45:05 +0000 (13:15 +0530)] 
SCOM Restore: Handle case of old HB and new STOP API case.

Commit addresses a situation where STOP API is new and HB is
old. It detects the siutation and retains legacy behavior.
This situation can arise if PHYP tries to use SCOM restore
changes of STOP API with older fipsdriver or OPAL does the
same on older HB binaries.
Key_Cronus_Test=PM_REGRESS

Change-Id: Iaaa866169904a47e10c79ae4894d2eedccfafe53
Reviewed-on: http://rchgit01.rchland.ibm.com/gerrit1/62610
Tested-by: Jenkins Server <pfd-jenkins+hostboot@us.ibm.com>
Tested-by: Hostboot CI <hostboot-ci+hostboot@us.ibm.com>
Tested-by: Cronus HW CI <cronushw-ci+hostboot@us.ibm.com>
Reviewed-by: RANGANATHPRASAD G. BRAHMASAMUDRA <prasadbgr@in.ibm.com>
Reviewed-by: AMIT J. TENDOLKAR <amit.tendolkar@in.ibm.com>
Reviewed-by: Gregory S. Still <stillgs@us.ibm.com>
Reviewed-on: http://rchgit01.rchland.ibm.com/gerrit1/62614
Tested-by: Jenkins OP Build CI <op-jenkins+hostboot@us.ibm.com>
Tested-by: Jenkins OP HW <op-hw-jenkins+hostboot@us.ibm.com>
Tested-by: FSP CI Jenkins <fsp-CI-jenkins+hostboot@us.ibm.com>
Reviewed-by: Christian R. Geddes <crgeddes@us.ibm.com>
[build fixes for OPAL : Akshay Adiga]
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agoREADME: Update Qemu instructions
Joel Stanley [Mon, 15 Oct 2018 05:20:25 +0000 (15:50 +1030)] 
README: Update Qemu instructions

Qemu has evolved since this text was written. We can now run skiboot on
upstream Qemu.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agohdata/i2c: Skip unknown device type
Vasant Hegde [Fri, 12 Oct 2018 06:25:20 +0000 (11:55 +0530)] 
hdata/i2c: Skip unknown device type

Do not add unknown I2C devices to device tree.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agohdata/i2c: Make SPD workaround more workaroundy
Oliver O'Halloran [Mon, 24 Sep 2018 07:14:08 +0000 (17:14 +1000)] 
hdata/i2c: Make SPD workaround more workaroundy

We have a hack in the I2C device parser to fix up entries generated by
hostboot for the DIMM SPD devices. For some reason they get reported as
128Kbit EEPROMs which is bad since those have a different I2C interface
to an actual SPD device.

Oddly enough, the FSP also gets this wrong in a slightly different way.
In the FSP case they are reported as a at24c04 (4Kbit) EEPROM, which
also has a different I2C interface.

To fix both these problems for any eeprom we find on that bus to have
the compatible string of "spd".

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agohdata/i2c: Add whitelisting for Host I2C devices
Oliver O'Halloran [Mon, 24 Sep 2018 07:14:07 +0000 (17:14 +1000)] 
hdata/i2c: Add whitelisting for Host I2C devices

Many of the devices that we get information about through HDAT are for
use by firmware rather than the host operating system. This patch adds
a boolean flag to hdat_i2c_info structure that indicates whether devices
with a given purpose should be reserved for use inside of OPAL (or some
other firmware component, such as the OCC).

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agoopal/hmi: Wakeup the cpu before reading core_fir
Vaibhav Jain [Mon, 27 Aug 2018 10:13:54 +0000 (15:43 +0530)] 
opal/hmi: Wakeup the cpu before reading core_fir

When stop state 5 is enabled, reading the core_fir during an HMI can
result in a xscom read error with xscom_read() returning an
OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At
present this return error code is not handled in decode_core_fir()
hence the invalid core_fir value is sent to the kernel where it
interprets it as a FATAL hmi causing a system check-stop.

This can be prevented by forcing the core to wake-up using before
reading the core_fir. Hence this patch wraps the call to
read_core_fir() within calls to dctl_set_special_wakeup() and
dctl_clear_special_wakeup().

Suggested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agophb4: Enable PHB MMIO-0/1 Bars only when mmio window exists
Vaibhav Jain [Thu, 23 Aug 2018 10:07:49 +0000 (15:37 +0530)] 
phb4: Enable PHB MMIO-0/1 Bars only when mmio window exists

Presently phb4_probe_stack() will always enable PHB MMIO0/1 windows
even if they doesn't exist in phy_map. Hence we do some minor shuffling
in the phb4_probe_stack() so that MMIO-0/1 Bars are only enabled if
there corresponding MMIO window exists in the phy_map. In case phy_map
for an mmio window is '0' we set the corresponding BAR register to
'0'.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agophb4/capp: Update the expected Eye-catcher for CAPP ucode lid
Vaibhav Jain [Mon, 3 Sep 2018 09:12:37 +0000 (14:42 +0530)] 
phb4/capp: Update the expected Eye-catcher for CAPP ucode lid

Currently on a FSP based P9 system load_capp_code() expects CAPP ucode
lid header to have eye-catcher magic of 'CAPPPSLL'. However skiboot
currently supports CAPP ucode only lids that have a eye-catcher magic
of 'CAPPLIDH'. This prevents skiboot from loading the ucode with this
error message:

CAPP: ucode header invalid

We fix this issue by updating load_capp_ucode() to use the eye-catcher
value of 'CAPPLIDH' instead of 'CAPPPSLL'.

Cc: stable
Fixes: e50764d4f2b1("capi: Load capp microcode")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agophb4/capp: Use link width to allocate STQ engines to CAPP
Vaibhav Jain [Sat, 8 Sep 2018 06:46:54 +0000 (12:16 +0530)] 
phb4/capp: Use link width to allocate STQ engines to CAPP

Update phb4_init_capp_regs() to allocates STQ Engines to CAPP/PEC2
based on link width instead of always assuming it to x8.

Also re-factor the function slightly to evaluate the link-width only
once and cache it so that it can also be used to allocate DMA read
engines.

Cc: stable
Fixes: 47c09cdfe7a3("phb4/capp: Calculate STQ/DMA read engines based on link-width for PEC")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2 months agocore/flash: Ignore prefix when comparing versions.
Samuel Mendoza-Jonas [Wed, 10 Oct 2018 06:32:40 +0000 (17:32 +1100)] 
core/flash: Ignore prefix when comparing versions.

The Skiboot version can include a "skiboot-" prefix if built with
something like Buildroot. The property being compared against won't
include this so ignore it.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>