| *** Settings *** |
| Documentation Utility for RAS test scenarios through HOST & BMC. |
| Resource ../../lib/utils.robot |
| Resource ../../lib/ras/host_utils.robot |
| Resource ../../lib/resource.robot |
| Resource ../../lib/state_manager.robot |
| Resource ../../lib/boot_utils.robot |
| Variables ../../lib/ras/variables.py |
| Variables ../../data/variables.py |
| Resource ../../lib/dump_utils.robot |
| |
| Library DateTime |
| Library OperatingSystem |
| Library random |
| Library Collections |
| |
| *** Variables *** |
| ${stack_mode} normal |
| |
| *** Keywords *** |
| |
| Verify And Clear Gard Records On HOST |
| [Documentation] Verify And Clear gard records on HOST. |
| |
| ${output}= Gard Operations On OS list |
| Should Not Contain ${output} No GARD |
| Gard Operations On OS clear all |
| |
| Verify Error Log Entry |
| [Documentation] Verify error log entry & signature description. |
| [Arguments] ${signature_desc} ${log_prefix} |
| # Description of argument(s): |
| # signature_desc Error log signature description. |
| # log_prefix Log path prefix. |
| |
| # TODO: Need to move this keyword to common utility. |
| |
| Error Logs Should Exist |
| |
| Collect eSEL Log ${log_prefix} |
| ${error_log_file_path}= Catenate ${log_prefix}esel.txt |
| ${rc} ${output}= Run and Return RC and Output |
| ... grep -i ${signature_desc} ${error_log_file_path} |
| Should Be Equal ${rc} ${0} |
| Should Not Be Empty ${output} |
| |
| Inject Recoverable Error With Threshold Limit |
| [Documentation] Inject and verify recoverable error on processor through |
| ... BMC/HOST. |
| ... Test sequence: |
| ... 1. Inject recoverable error on a given target |
| ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. |
| ... 2. Check If HOST is running. |
| ... 3. Verify error log entry & signature description. |
| ... 4. Verify & clear gard records. |
| [Arguments] ${interface_type} ${fir} ${chip_address} ${threshold_limit} |
| ... ${signature_desc} ${log_prefix} |
| # Description of argument(s): |
| # interface_type Inject error through 'BMC' or 'HOST'. |
| # fir FIR (Fault isolation register) value (e.g. 2011400). |
| # chip_address Chip address (e.g 2000000000000000). |
| # threshold_limit Threshold limit (e.g 1, 5, 32). |
| # signature_desc Error log signature description. |
| # log_prefix Log path prefix. |
| |
| Run Keyword Inject Error Through ${interface_type} |
| ... ${fir} ${chip_address} ${threshold_limit} ${master_proc_chip} |
| |
| Is Host Running |
| ${output}= Gard Operations On OS list |
| Should Contain ${output} No GARD |
| Verify Error Log Entry ${signature_desc} ${log_prefix} |
| # TODO: Verify SOL console logs. |
| |
| |
| Inject Unrecoverable Error |
| [Documentation] Inject and verify unrecoverable error on processor through |
| ... BMC/HOST. |
| ... Test sequence: |
| ... 1. Inject unrecoverable error on a given target |
| ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. |
| ... 2. Check If HOST is rebooted. |
| ... 3. Verify & clear gard records. |
| ... 4. Verify error log entry & signature description. |
| ... 5. Verify & clear dump entry. |
| [Arguments] ${interface_type} ${fir} ${chip_address} ${threshold_limit} |
| ... ${signature_desc} ${log_prefix} ${bmc_reboot}=${0} |
| # Description of argument(s): |
| # interface_type Inject error through 'BMC' or 'HOST'. |
| # fir FIR (Fault isolation register) value (e.g. 2011400). |
| # chip_address Chip address (e.g 2000000000000000). |
| # threshold_limit Threshold limit (e.g 1, 5, 32). |
| # signature_desc Error Log signature description. |
| # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable') |
| # log_prefix Log path prefix. |
| # bmc_reboot Do bmc reboot If bmc_reboot is set. |
| |
| Run Keyword Inject Error Through ${interface_type} |
| ... ${fir} ${chip_address} ${threshold_limit} ${master_proc_chip} |
| |
| # Do BMC Reboot after error injection. |
| Run Keyword If ${bmc_reboot} Run Keywords |
| ... Initiate BMC Reboot |
| ... Wait For BMC Ready |
| ... Initiate Host PowerOff |
| ... Initiate Host Boot |
| ... ELSE |
| ... Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted |
| |
| Wait for OS |
| Verify Error Log Entry ${signature_desc} ${log_prefix} |
| Read Properties ${DUMP_ENTRY_URI}list |
| Delete All BMC Dump |
| Verify And Clear Gard Records On HOST |
| |
| Fetch FIR Address Translation Value |
| [Documentation] Fetch FIR address translation value through HOST. |
| [Arguments] ${fir} ${target_type} |
| # Description of argument(s): |
| # fir FIR (Fault isolation register) value (e.g. '2011400'). |
| # core_id Core ID (e.g. '9'). |
| # target_type Target type (e.g. 'EX', 'EQ', 'C'). |
| |
| Login To OS Host |
| Copy Address Translation Utils To HOST OS |
| |
| # Fetch processor chip IDs. |
| ${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip} |
| # Example output: |
| # 00000000 |
| |
| ${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]} |
| # Example output: |
| #./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22 |
| # ['14', '15', '16', '17'] |
| |
| # Ignoring master core ID. |
| ${output}= Get Slice From List ${core_ids} 1 |
| # Feth random non-master core ID. |
| ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random |
| ${core_id}= Get From List ${core_ids_sub_list} 0 |
| ${translated_fir_addr}= FIR Address Translation Through HOST |
| ... ${fir} ${core_id} ${target_type} |
| |
| [Return] ${translated_fir_addr} |
| |
| RAS Test SetUp |
| [Documentation] Validates input parameters. |
| |
| Should Not Be Empty |
| ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host. |
| Should Not Be Empty |
| ... ${OS_USERNAME} msg=You must provide OS host user name. |
| Should Not Be Empty |
| ... ${OS_PASSWORD} msg=You must provide OS host user password. |
| |
| Smart Power Off |
| |
| # Boot to OS. |
| REST Power On quiet=${1} |
| # Adding delay after host bring up. |
| Sleep 60s |
| |
| RAS Suite Setup |
| [Documentation] Create RAS log directory to store all RAS test logs. |
| |
| ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/ |
| Set Suite Variable ${RAS_LOG_DIR_PATH} |
| Set Suite Variable ${master_proc_chip} False |
| |
| Create Directory ${RAS_LOG_DIR_PATH} |
| OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH} |
| Empty Directory ${RAS_LOG_DIR_PATH} |
| |
| Should Not Be Empty ${ESEL_BIN_PATH} |
| Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH} |
| |
| # Boot to Os. |
| REST Power On quiet=${1} |
| |
| # Check Opal-PRD service enabled on host. |
| ${opal_prd_state}= Is Opal-PRD Service Enabled |
| Run Keyword If '${opal_prd_state}' == 'disabled' |
| ... Enable Opal-PRD Service On HOST |
| |
| RAS Suite Cleanup |
| [Documentation] Perform RAS suite cleanup and verify that host |
| ... boots after test suite run. |
| |
| # Boot to OS. |
| REST Power On |
| Delete Error Logs |
| Gard Operations On OS clear all |
| |
| |
| Inject Error At HOST Boot Path |
| |
| [Documentation] Inject and verify recoverable error on processor through |
| ... BMC using pdbg tool at HOST Boot path. |
| ... Test sequence: |
| ... 1. Inject error on a given target |
| ... (e.g: Processor core, CAPP, MCA) through BMC using |
| ... pdbg tool at HOST Boot path. |
| ... 2. Check If HOST is rebooted and running. |
| ... 3. Verify error log entry & signature description. |
| ... 4. Verify & clear gard records. |
| [Arguments] ${fir} ${chip_address} ${signature_desc} ${log_prefix} |
| # Description of argument(s): |
| # fir FIR (Fault isolation register) value (e.g. 2011400). |
| # chip_address Chip address (e.g 2000000000000000). |
| # signature_desc Error log signature description. |
| # log_prefix Log path prefix. |
| |
| Inject Error Through BMC At HOST Boot ${fir} ${chip_address} |
| |
| Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted |
| Wait for OS |
| Verify Error Log Entry ${signature_desc} ${log_prefix} |
| Verify And Clear Gard Records On HOST |