blob: d09210668f1fe829e52a6b0a81684b7dd735ba43 [file] [log] [blame]
*** Settings ***
Documentation Utility for RAS test scenarios through HOST & BMC.
Resource ../../lib/utils.robot
Resource ../../lib/ras/host_utils.robot
Resource ../../lib/resource.robot
Resource ../../lib/state_manager.robot
Resource ../../lib/boot_utils.robot
Variables ../../lib/ras/variables.py
Variables ../../data/variables.py
Resource ../../lib/dump_utils.robot
Library DateTime
Library OperatingSystem
Library random
Library Collections
*** Variables ***
${stack_mode} normal
*** Keywords ***
Verify And Clear Gard Records On HOST
[Documentation] Verify And Clear gard records on HOST.
${output}= Gard Operations On OS list
Should Not Contain ${output} No GARD
Gard Operations On OS clear all
Verify Error Log Entry
[Documentation] Verify error log entry & signature description.
[Arguments] ${signature_desc} ${log_prefix}
# Description of argument(s):
# signature_desc Error log signature description.
# log_prefix Log path prefix.
# TODO: Need to move this keyword to common utility.
Error Logs Should Exist
Collect eSEL Log ${log_prefix}
${error_log_file_path}= Catenate ${log_prefix}esel.txt
${rc} ${output}= Run and Return RC and Output
... grep -i ${signature_desc} ${error_log_file_path}
Should Be Equal ${rc} ${0}
Should Not Be Empty ${output}
Inject Recoverable Error With Threshold Limit
[Documentation] Inject and verify recoverable error on processor through
... BMC/HOST.
... Test sequence:
... 1. Inject recoverable error on a given target
... (e.g: Processor core, CAPP, MCA) through BMC/HOST.
... 2. Check If HOST is running.
... 3. Verify error log entry & signature description.
... 4. Verify & clear gard records.
[Arguments] ${interface_type} ${fir} ${chip_address} ${threshold_limit}
... ${signature_desc} ${log_prefix}
# Description of argument(s):
# interface_type Inject error through 'BMC' or 'HOST'.
# fir FIR (Fault isolation register) value (e.g. 2011400).
# chip_address Chip address (e.g 2000000000000000).
# threshold_limit Threshold limit (e.g 1, 5, 32).
# signature_desc Error log signature description.
# log_prefix Log path prefix.
Run Keyword Inject Error Through ${interface_type}
... ${fir} ${chip_address} ${threshold_limit} ${master_proc_chip}
Is Host Running
${output}= Gard Operations On OS list
Should Contain ${output} No GARD
Verify Error Log Entry ${signature_desc} ${log_prefix}
# TODO: Verify SOL console logs.
Inject Unrecoverable Error
[Documentation] Inject and verify unrecoverable error on processor through
... BMC/HOST.
... Test sequence:
... 1. Inject unrecoverable error on a given target
... (e.g: Processor core, CAPP, MCA) through BMC/HOST.
... 2. Check If HOST is rebooted.
... 3. Verify & clear gard records.
... 4. Verify error log entry & signature description.
... 5. Verify & clear dump entry.
[Arguments] ${interface_type} ${fir} ${chip_address} ${threshold_limit}
... ${signature_desc} ${log_prefix} ${bmc_reboot}=${0}
# Description of argument(s):
# interface_type Inject error through 'BMC' or 'HOST'.
# fir FIR (Fault isolation register) value (e.g. 2011400).
# chip_address Chip address (e.g 2000000000000000).
# threshold_limit Threshold limit (e.g 1, 5, 32).
# signature_desc Error Log signature description.
# (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable')
# log_prefix Log path prefix.
# bmc_reboot Do bmc reboot If bmc_reboot is set.
Run Keyword Inject Error Through ${interface_type}
... ${fir} ${chip_address} ${threshold_limit} ${master_proc_chip}
# Do BMC Reboot after error injection.
Run Keyword If ${bmc_reboot} Run Keywords
... Initiate BMC Reboot
... Wait For BMC Ready
... Initiate Host PowerOff
... Initiate Host Boot
... ELSE
... Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
Wait for OS
Verify Error Log Entry ${signature_desc} ${log_prefix}
Read Properties ${DUMP_ENTRY_URI}list
Delete All BMC Dump
Verify And Clear Gard Records On HOST
Fetch FIR Address Translation Value
[Documentation] Fetch FIR address translation value through HOST.
[Arguments] ${fir} ${target_type}
# Description of argument(s):
# fir FIR (Fault isolation register) value (e.g. '2011400').
# core_id Core ID (e.g. '9').
# target_type Target type (e.g. 'EX', 'EQ', 'C').
Login To OS Host
Copy Address Translation Utils To HOST OS
# Fetch processor chip IDs.
${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip}
# Example output:
# 00000000
${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]}
# Example output:
#./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22
# ['14', '15', '16', '17']
# Ignoring master core ID.
${output}= Get Slice From List ${core_ids} 1
# Feth random non-master core ID.
${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random
${core_id}= Get From List ${core_ids_sub_list} 0
${translated_fir_addr}= FIR Address Translation Through HOST
... ${fir} ${core_id} ${target_type}
[Return] ${translated_fir_addr}
RAS Test SetUp
[Documentation] Validates input parameters.
Should Not Be Empty
... ${OS_HOST} msg=You must provide DNS name/IP of the OS host.
Should Not Be Empty
... ${OS_USERNAME} msg=You must provide OS host user name.
Should Not Be Empty
... ${OS_PASSWORD} msg=You must provide OS host user password.
Smart Power Off
# Boot to OS.
REST Power On quiet=${1}
# Adding delay after host bring up.
Sleep 60s
RAS Suite Setup
[Documentation] Create RAS log directory to store all RAS test logs.
${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/
Set Suite Variable ${RAS_LOG_DIR_PATH}
Set Suite Variable ${master_proc_chip} False
Create Directory ${RAS_LOG_DIR_PATH}
OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH}
Empty Directory ${RAS_LOG_DIR_PATH}
Should Not Be Empty ${ESEL_BIN_PATH}
Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH}
# Boot to Os.
REST Power On quiet=${1}
# Check Opal-PRD service enabled on host.
${opal_prd_state}= Is Opal-PRD Service Enabled
Run Keyword If '${opal_prd_state}' == 'disabled'
... Enable Opal-PRD Service On HOST
RAS Suite Cleanup
[Documentation] Perform RAS suite cleanup and verify that host
... boots after test suite run.
# Boot to OS.
REST Power On
Delete Error Logs
Gard Operations On OS clear all
Inject Error At HOST Boot Path
[Documentation] Inject and verify recoverable error on processor through
... BMC using pdbg tool at HOST Boot path.
... Test sequence:
... 1. Inject error on a given target
... (e.g: Processor core, CAPP, MCA) through BMC using
... pdbg tool at HOST Boot path.
... 2. Check If HOST is rebooted and running.
... 3. Verify error log entry & signature description.
... 4. Verify & clear gard records.
[Arguments] ${fir} ${chip_address} ${signature_desc} ${log_prefix}
# Description of argument(s):
# fir FIR (Fault isolation register) value (e.g. 2011400).
# chip_address Chip address (e.g 2000000000000000).
# signature_desc Error log signature description.
# log_prefix Log path prefix.
Inject Error Through BMC At HOST Boot ${fir} ${chip_address}
Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
Wait for OS
Verify Error Log Entry ${signature_desc} ${log_prefix}
Verify And Clear Gard Records On HOST