Ignore HRESET_NOT_READY state until HRESET completes

After HRESET has been requested, code will wait for HRESET_READY or
HRESET_FAILED status before attempting OCC communication again.

Code will also not clear the outstandingHReset until READY/FAILED, since
the reset should still be in progress.

OCC comm will get disabled before the HRESET and re-enabled if
reset completes successfully. If failed, no further comm will work.

My testing found that pldm instance ids were not getting freed
automatically when receiving a response. So this change will also free
those IDs when the response is received.

Tested on Rainier with recoverable and unrecoverable SBE injects.

'''
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: readOccState: Failed to read OCC0 state: Read error on I/O operation -  failbit badbit
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::readOccState: open/read failed trying to read OCC0 state (open errno=0)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: readOccState: Failed to read OCC0 state: Read error on I/O operation -  failbit badbit
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::readOccState: open/read failed trying to read OCC0 state (open errno=11)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: SBE timeout, requesting HRESET (OCC0)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Status::occActive OCC0 changed to False
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: got id 15 and set PldmInstanceId to 15
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: openMctpDemuxTransport: pldmFd has fd=9
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: sendPldm: calling pldm_transport_send_msg(OCC0, instance:15, 8 bytes, timeout 30)
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: calling pldm_transport_recv_msg() instance:15
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: pldm_transport_recv_msg() rsp was 4 bytes
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldmResetCallback: Reset has been successfully started
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: Freed PLDM instance ID 15
Feb 13 18:33:29 p10bmc openpower-occ-control[22740]: pldm: HRESET is NOT READY (OCC0)
Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: HRESET succeeded (OCC0)
Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: Status::occActive OCC0 changed to True
Feb 13 18:34:30 p10bmc openpower-occ-control[22740]: validateOccMaster: OCC0 is master of 4 OCCs
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Status::readOccState: OCC0 state 0x3 (lastState: 0x0)
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: PowerMode::sendModeChange: SET_MODE(12,0) command to OCC0 (9 bytes)
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Idle Power Saver Parameters: enabled:True, enter:8%/240s, exit:12%/10s
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: PowerMode::sendIpsData: SET_CFG_DATA[IPS] command to OCC0 (12 bytes)
Feb 13 18:34:34 p10bmc openpower-occ-control[22740]: Status::readOccState: successfully read OCC0 state: 3
'''

Change-Id: I7e5bc60576e4e8fa6cba4253be535220cb8048ec
Signed-off-by: Chris Cain <cjcain@us.ibm.com>
2 files changed
tree: 2bbc332189583213c2bf765a9dac45519c59c9cf
  1. example/
  2. service_files/
  3. subprojects/
  4. test/
  5. .clang-format
  6. .gitignore
  7. app.cpp
  8. file.hpp
  9. i2c_occ.cpp
  10. i2c_occ.hpp
  11. LICENSE
  12. meson.build
  13. meson.options
  14. occ-active.sh
  15. occ_command.cpp
  16. occ_command.hpp
  17. occ_dbus.cpp
  18. occ_dbus.hpp
  19. occ_device.cpp
  20. occ_device.hpp
  21. occ_errors.cpp
  22. occ_errors.hpp
  23. occ_events.hpp
  24. occ_ffdc.cpp
  25. occ_ffdc.hpp
  26. occ_manager.cpp
  27. occ_manager.hpp
  28. occ_pass_through.cpp
  29. occ_pass_through.hpp
  30. occ_presence.cpp
  31. occ_presence.hpp
  32. occ_sensor.mako.hpp
  33. occ_status.cpp
  34. occ_status.hpp
  35. OWNERS
  36. pldm.cpp
  37. pldm.hpp
  38. powercap.cpp
  39. powercap.hpp
  40. powermode.cpp
  41. powermode.hpp
  42. README.md
  43. sensor_gen.py
  44. utils.cpp
  45. utils.hpp
README.md

OpenPOWER OCC Control Service

This service will handle communications to the On-Chip Controller (OCC) on Power processors. The OCC provides processor and memory temperatures, power readings, power cap support, system power mode support, and idle power saver support. OCC Control will be interfacing with the OCC to collect the temperatures and power readings, updating the system power mode, setting power caps, and idle power save parameters.

The service is started automatically when the BMC is started.

Build Project

This project can be built with meson. The typical meson workflow is: meson builddir && ninja -C builddir.

Server

The server will start automatically after BMC is powered on.

Server status: systemctl status org.open_power.OCC.Control.service

To restart the service: systemctl restart org.open_power.OCC.Control.service

Configuration

Service files are located in service_files subdirectory.

References

Power10

IBM EnergyScale for Power10 Processor-Based Systems whitepaper: https://www.ibm.com/downloads/cas/E7RL9N4E

OCC Firmware Interface Spec for Power10: https://github.com/open-power/docs/blob/P10/occ/OCC_P10_FW_Interfaces_v1_17.pdf

OCC Firmware: https://github.com/open-power/occ/tree/master-p10

Power9

IBM EnergyScale for POWER9 Processor-Based Systems: https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=49019149USEN&

OCC Firmware Interface Spec for POWER9: https://github.com/open-power/docs/blob/P9/occ/OCC_P9_FW_Interfaces.pdf