OpenBMC RAS test cases.
Automated 11 host related RAS TC's with recoverable errors with
thershold limit 1, 5, 32 & unrecoverable error.
Error injection for the following:
MCIFIR - RECV1, RECV32, UE.
MCACALFIR - RECV1, UE.
NXDMAENGFIR - RECV1, UE.
CXAFIR - RECV5, RECV32.
OB_LFIR - RECV32.
NPU0FIR - RECV32.
Resolves openbmc/openbmc-test-automation#906
Change-Id: Ia3bb63bf9776b93285e938b36a47543a546fcbbd
Signed-off-by: Sridevi Ramesh <sridevra@in.ibm.com>
diff --git a/extended/test_host_ras.robot b/extended/test_host_ras.robot
new file mode 100755
index 0000000..1a2b11d
--- /dev/null
+++ b/extended/test_host_ras.robot
@@ -0,0 +1,286 @@
+*** Settings ***
+Documentation This suite tests checkstop operations through HOST.
+Resource ../lib/utils.robot
+Resource ../lib/openbmc_ffdc.robot
+Resource ../lib/ras/host_utils.robot
+Resource ../lib/resource.txt
+Resource ../lib/state_manager.robot
+Resource ../lib/openbmc_ffdc_methods.robot
+Resource ../lib/boot_utils.robot
+Variables ../lib/ras/variables.py
+
+Library DateTime
+Library OperatingSystem
+
+Suite Setup RAS Suite Setup
+Test Setup RAS Test Setup
+Test Teardown FFDC On Test Case Fail
+Suite Teardown RAS Suite Cleanup
+
+*** Variables ***
+${stack_mode} normal
+
+*** Test Cases ***
+
+# Memory channel (MCACALIFIR) related error injection.
+
+Verify Recoverable Callout Handling For MCA With Threshold 1
+ [Documentation] Verify recoverable callout handling for MCACALIFIR with
+ ... threshold 1.
+ [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_1
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV1
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th1
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
+
+Verify Recoverable Callout Handling For MCA With Threshold 32
+ [Documentation] Verify recoverable callout handling for MCACALIFIR with
+ ... threshold 32.
+ [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_32
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV32
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th32
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
+
+
+Verify Unrecoverable Callout Handling For MCA
+ [Documentation] Verify unrecoverable callout handling for MCACALIFIR.
+ [Tags] Verify_Unrecoverable_Callout_Handling_For_MCA
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_UE
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir
+ Inject Unrecoverable Error Through Host
+ ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
+
+# Memory buffer (MCIFIR) related error injection.
+
+Verify Recoverable Callout Handling For MCI With Threshold 1
+ [Documentation] Verify recoverable callout handling for mci with
+ ... threshold 1.
+ [Tags] Verify_Recoverable_Callout_Handling_For_MCI_With_Threshold_1
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_RECV1
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir_th1
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
+
+Verify Unrecoverable Callout Handling For MCI
+ [Documentation] Verify unrecoverable callout handling for mci.
+ [Tags] Verify_Unrecoverable_Callout_Handling_For_MCI
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_UE
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir
+ Inject Unrecoverable Error Through Host
+ ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
+
+# Nest accelerator NXDMAENGFIR related error injection.
+
+Verify Recoverable Callout Handling For NXDMAENG With Threshold 1
+ [Documentation] Verify recoverable callout handling for NXDMAENG with
+ ... threshold 1.
+ [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_1
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV1
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th1
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
+
+
+Verify Recoverable Callout Handling For NXDMAENG With Threshold 32
+ [Documentation] Verify recoverable callout handling for NXDMAENG with
+ ... threshold 32.
+ [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_32
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV32
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th32
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
+
+# CAPP accelerator (CXAFIR) related error injection.
+
+Verify Recoverable Callout Handling For CXA With Threshold 5
+ [Documentation] Verify recoverable callout handling for CXA with
+ ... threshold 5.
+ [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_5
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV5
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th5
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 5 ${value[2]} ${err_log_path}
+
+Verify Recoverable Callout Handling For CXA With Threshold 32
+ [Documentation] Verify recoverable callout handling for CXA with
+ ... threshold 32.
+ [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_32
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV32
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th32
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
+
+# OBUSFIR related error injection.
+
+Verify Recoverable Callout Handling For OBUS With Threshold 32
+ [Documentation] Verify recoverable callout handling for OBUS with
+ ... threshold 32.
+ [Tags] Verify_Recoverable_Callout_Handling_For_OBUS_With_Threshold_32
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} OBUS_RECV32
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}obusfir_th32
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
+
+# Nvidia graphics processing units (NPU0FIR) related error injection.
+
+Verify Recoverable Callout Handling For NPU0 With Threshold 32
+ [Documentation] Verify recoverable callout handling for NPU0 with
+ ... threshold 32.
+ [Tags] Verify_Recoverable_Callout_Handling_For_NPU0_With_Threshold_32
+
+ ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NPU0_RECV32
+ ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}npu0fir_th32
+ Inject Recoverable Error With Threshold Limit Through Host
+ ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
+
+*** Keywords ***
+
+Inject Error Through HOST
+ [Documentation] Inject checkstop on processor through HOST.
+ ... Test sequence:
+ ... 1. Boot To HOST
+ ... 2. Clear any existing gard records
+ ... 3. Inject Error on processor/centaur
+ [Arguments] ${fri} ${chip_address} ${threshold_limit}
+ # Description of argument(s):
+ # fri FRI value (e.g. 2011400).
+ # chip_address chip address (e.g 2000000000000000).
+ # threshold_limit Threshold limit (e.g 1, 5, 32).
+
+ Delete Error Logs
+ Login To OS Host
+ Gard Operations On OS clear all
+
+ # Fetch processor chip IDs.
+ ${chip_ids}= Get ChipID From OS Processor
+ ${proc_ids}= Split String ${chip_ids}
+ ${proc_id}= Get From List ${proc_ids} 1
+
+ ${threshold_limit}= Convert To Integer ${threshold_limit}
+ :FOR ${i} IN RANGE ${threshold_limit}
+ \ Run Keyword Putscom Through OS ${proc_id} ${fri} ${chip_address}
+ # Adding delay after each error injection.
+ \ Sleep 3s
+ # Adding delay to get error log after error injection.
+ Sleep 20s
+
+Verify And Clear Gard Records On HOST
+ [Documentation] Verify And Clear gard records on HOST.
+
+ Login To OS Host
+ ${output}= Gard Operations On OS list
+ Should Not Contain ${output} 'No GARD entries to display'
+ Gard Operations On OS clear all
+
+Verify Error Log Entry
+ [Documentation] Verify error log entry & signature description.
+ [Arguments] ${signature_desc} ${log_prefix}
+ # Description of argument(s):
+ # signature_desc Error log signature description.
+ # log_prefix Log path prefix.
+
+ ${resp}= OpenBMC Get Request ${BMC_LOGGING_ENTRY}/list
+ Should Not Be Equal As Strings ${resp.status_code} ${HTTP_NOT_FOUND}
+
+ Collect eSEL Log ${log_prefix}
+ ${error_log_file_path}= Catenate ${log_prefix}esel.txt
+ ${rc} ${output} = Run and Return RC and Output
+ ... grep ${signature_desc} ${error_log_file_path}
+ Should Not Be Empty ${output}
+
+Inject Recoverable Error With Threshold Limit Through Host
+ [Documentation] Inject and verify recoverable error on processor through
+ ... host.
+ ... Test sequence:
+ ... 1. Enable Auto Reboot Setting
+ ... 2. Inject Error on processor/centaur
+ ... 3. Check If HOST is running.
+ ... 4. Verify error log entry & signature description.
+ ... 4. Verify & clear gard records.
+ [Arguments] ${fri} ${chip_address} ${threshold_limit}
+ ... ${signature_desc} ${log_prefix}
+ # Description of argument(s):
+ # fri FRI(Fault isolation register) value (e.g. 2011400).
+ # chip_address Chip address (e.g 2000000000000000).
+ # threshold_limit Threshold limit (e.g 1, 5, 32).
+ # signature_desc Error log signature description.
+ # log_prefix Log path prefix.
+
+ Set Auto Reboot 1
+ Inject Error Through HOST ${fri} ${chip_address} ${threshold_limit}
+ Is Host Running
+ ${output}= Gard Operations On OS list
+ Should Contain ${output} No GARD
+ Verify Error Log Entry ${signature_desc} ${log_prefix}
+
+
+Inject Unrecoverable Error Through Host
+ [Documentation] Inject and verify recoverable error on processor through
+ ... host.
+ ... Test sequence:
+ ... 1. Enable Auto Reboot Setting
+ ... 2. Inject Error on processor/centaur
+ ... 3. Check If HOST is rebooted.
+ ... 4. Verify error log entry & signature description.
+ ... 4. Verify & clear gard records.
+ [Arguments] ${fri} ${chip_address} ${threshold_limit}
+ ... ${signature_desc} ${log_prefix}
+ # Description of argument(s):
+ # fri FRI value (e.g. 2011400).
+ # chip_address Chip address (e.g 2000000000000000).
+ # threshold_limit Threshold limit (e.g 1, 5, 32).
+ # signature_desc Error Log signature description.
+ # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable')
+ # log_prefix Log path prefix.
+
+ Set Auto Reboot 1
+ Inject Error Through HOST ${fri} ${chip_address} ${threshold_limit}
+ Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
+ Wait for OS
+ Verify And Clear Gard Records On HOST
+ Verify Error Log Entry ${signature_desc} ${log_prefix}
+
+
+RAS Test SetUp
+ [Documentation] Validates input parameters.
+
+ Should Not Be Empty
+ ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host.
+ Should Not Be Empty
+ ... ${OS_USERNAME} msg=You must provide OS host user name.
+ Should Not Be Empty
+ ... ${OS_PASSWORD} msg=You must provide OS host user password.
+
+ # Boot to OS.
+
+ REST Power On
+
+RAS Suite Setup
+ [Documentation] Create RAS log directory to store all RAS test logs.
+
+ ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/
+ Set Suite Variable ${RAS_LOG_DIR_PATH}
+ Create Directory ${RAS_LOG_DIR_PATH}
+ OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH}
+ Empty Directory ${RAS_LOG_DIR_PATH}
+
+RAS Suite Cleanup
+ [Documentation] Perform RAS suite cleanup and verify that host
+ ... boots after test suite run.
+
+ # Boot to OS.
+ REST Power On
+ Delete Error Logs
+ Login To OS Host
+ Gard Operations On OS clear all
diff --git a/extended/test_ras.robot b/extended/test_ras.robot
deleted file mode 100644
index c101abd..0000000
--- a/extended/test_ras.robot
+++ /dev/null
@@ -1,121 +0,0 @@
-*** Settings ***
-Documentation This suite tests checkstop operations through OS.
-Resource ../lib/utils.robot
-Resource ../lib/openbmc_ffdc.robot
-Resource ../lib/ras/host_utils.robot
-Resource ../lib/resource.txt
-Resource ../lib/state_manager.robot
-Test Setup RAS Test Setup
-Test Teardown FFDC On Test Case Fail
-
-*** Variables ***
-${HOST_SETTINGS} ${SETTINGS_URI}host0
-
-*** Test Cases ***
-
-Verify Channel Checkstop Through OS With Auto Reboot
-
- [Documentation] Verify Channel Checkstop (MBS FIR REG INT PROTOCOL ERROR)
- ... through OS With Auto Reboot settings enabled.
- [Tags] Verify_Channel_Checkstop_Through_OS_With_Auto_Reboot
-
- Verify Checkstop Insertion With Auto Reboot
- ... Centaur 2011400 4000000000000000
-
-
-Verify Host Reboot On Host Booted System With Auto Reboot Enabled
- [Documentation] Verify host reboot after host watchdog error on host
- ... booted system with auto reboot enabled.
- [Tags] Verify_Host_Reboot_On_Host_Booted_System_With_Auto_Reboot_Enabled
-
- Initiate Host Boot
- Wait for OS ${OS_HOST} ${OS_USERNAME} ${OS_PASSWORD}
-
- Set Auto Reboot ${1}
-
- Trigger Host Watchdog Error
-
- Wait Until Keyword Succeeds 3 min 5 sec Is Host Rebooted
- Wait for OS ${OS_HOST} ${OS_USERNAME} ${OS_PASSWORD}
-
-
-Verify Host Quiesced On Host Booted System With Auto Reboot Disabled
- [Documentation] Verify host quiesced state after host watchdog error on
- ... host booted system with auto reboot disabled.
- [Tags] Verify_Host_Quiesced_On_Host_Booted_System_With_Auto_Reboot_Disabled
-
- Initiate Host Boot
- Wait for OS ${OS_HOST} ${OS_USERNAME} ${OS_PASSWORD}
-
- Set Auto Reboot ${0}
-
- Trigger Host Watchdog Error
-
- Wait Until Keyword Succeeds 3 min 5 sec Is Host Quiesced
- Recover Quiesced Host
-
-
-*** Keywords ***
-Inject Checkstop Through OS
- [Documentation] Inject checkstop on processor/centaur through OS.
- ... Test sequence:
- ... 1. Boot To OS
- ... 2. Clear any existing gard records
- ... 3. Inject Checkstop on processor/centaur
- [Arguments] ${chip_type} ${fru} ${address}
- # Description of arguments:
- # chip_type The chip type (Processor/Centaur).
- # fru FRU value (e.g. 2011400).
- # address chip address (e.g 4000000000000000).
-
-
- Login To OS Host ${OS_HOST} ${OS_USERNAME} ${OS_PASSWORD}
- # Get core values are present through OS.
- Get Cores Values From OS
-
- Gard Operations On OS clear all
-
- # Fetch Processor/Centaur chip value based on the input chip_type.
- ${output}= Get ChipID From OS ${chip_type}
- ${chip_values}= Split String ${output}
- ${chip_value}= Get From List ${chip_values} 0
-
- Putscom Through OS ${chip_value} ${fru} ${address}
-
-Verify And Clear Gard Records On OS
- [Documentation] Verify And Clear gard records on OS.
-
- Login To OS Host ${OS_HOST} ${OS_USERNAME} ${OS_PASSWORD}
- ${output}= Gard Operations On OS list
- Should Not Contain ${output} 'No GARD entries to display'
- Gard Operations On OS clear all
-
-Verify Checkstop Insertion With Auto Reboot
- [Documentation] Inject and verify checkstop on processor/centaur through
- ... OS with auto reboot.
- ... Test sequence:
- ... 1. Enable Auto Reboot Setting
- ... 2. Inject Checkstop on processor/centaur
- ... 3. Check If HOST rebooted and OS is up
- ... 4. Verify & clear gard records
- [Arguments] ${chip_type} ${fru} ${address}
- # Description of arguments:
- # chip_type The chip type (Processor/Centaur).
- # fru FRU value (e.g. 2011400).
- # address chip address (e.g 4000000000000000).
-
- Set Auto Reboot ${1}
- Inject Checkstop Through OS ${chip_type} ${fru} ${address}
- Wait Until Keyword Succeeds 120 sec 20 sec Is Host Rebooted
- Wait for OS ${OS_HOST} ${OS_USERNAME} ${OS_PASSWORD}
- Verify And Clear Gard Records On OS
-
-RAS Test SetUp
- [Documentation] Validates input parameters.
-
- Should Not Be Empty
- ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host.
- Should Not Be Empty
- ... ${OS_USERNAME} msg=You must provide OS host user name.
- Should Not Be Empty
- ... ${OS_PASSWORD} msg=You must provide OS host user password.
diff --git a/lib/openbmc_ffdc_methods.robot b/lib/openbmc_ffdc_methods.robot
index 6359c68..45e1ca7 100755
--- a/lib/openbmc_ffdc_methods.robot
+++ b/lib/openbmc_ffdc_methods.robot
@@ -319,6 +319,8 @@
Collect eSEL Log
[Documentation] Collect eSEL log from logging entry and convert eSEL data
... to elog formated string text file.
+ [Arguments] ${log_prefix_path}=${LOG_PREFIX}
+
${resp}= OpenBMC Get Request ${BMC_LOGGING_ENTRY}/enumerate quiet=${1}
${status}= Run Keyword And Return Status
... Should Be Equal As Strings ${resp.status_code} ${HTTP_OK}
@@ -332,7 +334,7 @@
# /xyz/openbmc_project/logging/entry/2
${esel_list}= Get Dictionary Keys ${content['data']}
- ${logpath}= Catenate SEPARATOR= ${LOG_PREFIX} esel
+ ${logpath}= Catenate SEPARATOR= ${log_prefix_path} esel
Create File ${logpath}
# Fetch data from /xyz/openbmc_project/logging/entry/1/attr/AdditionalData
# "ESEL=00 00 df 00 00 00 00 20 00 04 12 35 6f aa 00 00 "
diff --git a/lib/ras/variables.py b/lib/ras/variables.py
new file mode 100644
index 0000000..65cf5ec
--- /dev/null
+++ b/lib/ras/variables.py
@@ -0,0 +1,46 @@
+
+r"""
+Signature description in error log corresponding to error injection.
+"""
+
+DES_MCA_RECV1 = "'mca.n0p0c0.*MCACALFIR[^0].*A MBA recoverable error'"
+DES_MCA_RECV32 = "'mca.n0p0c0.*MCACALFIR[^2].*Excessive refreshes'"
+DES_MCA_UE = "'mca.n0p0c0.*MCACALFIR[^10].*State machine'"
+
+
+DES_MCS_RECV1 = "'mcs.n0p0c0.*MCFIR[^0].*mc internal recoverable'"
+DES_MCS_UE = "'mcs.n0p0c0.*MCFIR[^1].*mc internal non recovervable'"
+
+
+DES_NX_RECV1 = "'pu.n0p0.*NXDMAENGFIR[^5].*Channel 0 842 engine ECC'"
+DES_NX_RECV32 = "'pu.n0p0.*NXDMAENGFIR[^4].*Channel 0 842 engine ECC'"
+
+DES_OBUS_RECV32 = "'ob.n0p0c0.*OB_LFIR[^0].*CFIR internal parity error'"
+
+DES_CXA_RECV5 = "'capp.n0p0c0.*CXAFIR[^34].*CXA CE on data received'"
+DES_CXA_RECV32 = "'capp.n0p0c0.*CXAFIR[^2].*CXA CE on Master array'"
+
+DES_NPU0_RECV32 = "'pu.n0p0.*NPU0FIR[^0].*NTL array CE'"
+
+# The following is an error injection dictionary with each entry consisting of:
+# - field_name: Targettype_threshold_limit .
+# - A list consisting of the following fields:
+# - field1: FIR (Fault isolation register) value.
+# - field2: chip address.
+# - field3: Error log signature description.
+
+ERROR_INJECT_DICT = {'MCACALIFIR_RECV1': ['07010900', '8000000000000000',\
+ DES_MCA_RECV1],
+ 'MCACALIFIR_RECV32': ['07010900', '2000000000000000', \
+ DES_MCA_RECV32],
+ 'MCACALIFIR_UE': ['07010900', '0020000000000000', DES_MCA_UE],
+ 'MCS_RECV1': ['05010800', '8000000000000000', DES_MCS_RECV1],
+ 'MCS_UE': ['05010800', '4000000000000000', DES_MCS_UE],
+ 'NX_RECV1': ['02011100','0400000000000000', DES_NX_RECV1],
+ 'NX_RECV32': ['02011100', '0800000000000000', DES_NX_RECV32],
+ 'CXA_RECV5': ['02010800', '0000000020000000', DES_CXA_RECV5],
+ 'CXA_RECV32': ['02010800', '2000000000000000', DES_CXA_RECV32],
+ 'OBUS_RECV32': ['0904000a', '8000000000000000', DES_OBUS_RECV32],
+ 'NPU0_RECV32': ['05011400', '8000000000000000', DES_NPU0_RECV32]
+ }
+