blob: fa82d9414b7047e8db8aa45e8a3f1e01e9b2c16b [file] [log] [blame] [edit]
*** Settings ***
Documentation Utility for RAS test scenarios through HOST & BMC.
Resource ../../lib/utils.robot
Resource ../../lib/ras/host_utils.robot
Resource ../../lib/resource.robot
Resource ../../lib/state_manager.robot
Resource ../../lib/boot_utils.robot
Variables ../../lib/ras/variables.py
Variables ../../data/variables.py
Resource ../../lib/dump_utils.robot
Library DateTime
Library OperatingSystem
Library random
Library Collections
*** Variables ***
${stack_mode} normal
*** Keywords ***
Verify And Clear Gard Records On HOST
[Documentation] Verify And Clear gard records on HOST.
${output}= Gard Operations On OS list
Should Not Contain ${output} No GARD
Gard Operations On OS clear all
Verify Error Log Entry
[Documentation] Verify error log entry & signature description.
[Arguments] ${signature_desc} ${log_prefix}
# Description of argument(s):
# signature_desc Error log signature description.
# log_prefix Log path prefix.
# TODO: Need to move this keyword to common utility.
Error Logs Should Exist
Collect eSEL Log ${log_prefix}
${error_log_file_path}= Catenate ${log_prefix}esel.txt
${rc} ${output}= Run and Return RC and Output
... grep -i ${signature_desc} ${error_log_file_path}
Should Be Equal ${rc} ${0}
Should Not Be Empty ${output}
Inject Recoverable Error With Threshold Limit
[Documentation] Inject and verify recoverable error on processor through
... BMC/HOST.
... Test sequence:
... 1. Inject recoverable error on a given target
... (e.g: Processor core, CAPP, MCA) through BMC/HOST.
... 2. Check If HOST is running.
... 3. Verify error log entry & signature description.
... 4. Verify & clear gard records.
[Arguments] ${interface_type} ${fir} ${chip_address} ${threshold_limit}
... ${signature_desc} ${log_prefix}
# Description of argument(s):
# interface_type Inject error through 'BMC' or 'HOST'.
# fir FIR (Fault isolation register) value (e.g. 2011400).
# chip_address Chip address (e.g 2000000000000000).
# threshold_limit Threshold limit (e.g 1, 5, 32).
# signature_desc Error log signature description.
# log_prefix Log path prefix.
Run Keyword If '${interface_type}' == 'HOST'
... Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit}
... ${master_proc_chip}
... ELSE
... Inject Error Through BMC ${fir} ${chip_address} ${threshold_limit}
... ${master_proc_chip}
Is Host Running
${output}= Gard Operations On OS list
Should Contain ${output} No GARD
Verify Error Log Entry ${signature_desc} ${log_prefix}
# TODO: Verify SOL console logs.
Inject Unrecoverable Error
[Documentation] Inject and verify unrecoverable error on processor through
... BMC/HOST.
... Test sequence:
... 1. Inject unrecoverable error on a given target
... (e.g: Processor core, CAPP, MCA) through BMC/HOST.
... 2. Check If HOST is rebooted.
... 3. Verify & clear gard records.
... 4. Verify error log entry & signature description.
... 5. Verify & clear dump entry.
[Arguments] ${interface_type} ${fir} ${chip_address} ${threshold_limit}
... ${signature_desc} ${log_prefix}
# Description of argument(s):
# interface_type Inject error through 'BMC' or 'HOST'.
# fir FIR (Fault isolation register) value (e.g. 2011400).
# chip_address Chip address (e.g 2000000000000000).
# threshold_limit Threshold limit (e.g 1, 5, 32).
# signature_desc Error Log signature description.
# (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable')
# log_prefix Log path prefix.
Run Keyword If '${interface_type}' == 'HOST'
... Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit}
... ${master_proc_chip}
... ELSE
... Inject Error Through BMC ${fir} ${chip_address} ${threshold_limit}
... ${master_proc_chip}
Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
Wait for OS
Verify Error Log Entry ${signature_desc} ${log_prefix}
${resp}= OpenBMC Get Request ${DUMP_ENTRY_URI}list
Should Not Be Equal As Strings ${resp.status_code} ${HTTP_NOT_FOUND}
Delete All BMC Dump
Verify And Clear Gard Records On HOST
Fetch FIR Address Translation Value
[Documentation] Fetch FIR address translation value through HOST.
[Arguments] ${fir} ${target_type}
# Description of argument(s):
# fir FIR (Fault isolation register) value (e.g. '2011400').
# core_id Core ID (e.g. '9').
# target_type Target type (e.g. 'EX', 'EQ', 'C').
Login To OS Host
Copy Address Translation Utils To HOST OS
# Fetch processor chip IDs.
${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip}
# Example output:
# 00000000
${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]}
# Example output:
#./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22
# ['14', '15', '16', '17']
# Ignoring master core ID.
${output}= Get Slice From List ${core_ids} 1
# Feth random non-master core ID.
${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random
${core_id}= Get From List ${core_ids_sub_list} 0
${translated_fir_addr}= FIR Address Translation Through HOST
... ${fir} ${core_id} ${target_type}
[Return] ${translated_fir_addr}
RAS Test SetUp
[Documentation] Validates input parameters.
Should Not Be Empty
... ${OS_HOST} msg=You must provide DNS name/IP of the OS host.
Should Not Be Empty
... ${OS_USERNAME} msg=You must provide OS host user name.
Should Not Be Empty
... ${OS_PASSWORD} msg=You must provide OS host user password.
# Boot to OS.
REST Power On quiet=${1}
# Adding delay after host bring up.
Sleep 60s
RAS Suite Setup
[Documentation] Create RAS log directory to store all RAS test logs.
${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/
Set Suite Variable ${RAS_LOG_DIR_PATH}
Set Suite Variable ${master_proc_chip} False
Create Directory ${RAS_LOG_DIR_PATH}
OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH}
Empty Directory ${RAS_LOG_DIR_PATH}
Should Not Be Empty ${ESEL_BIN_PATH}
Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH}
# Boot to Os.
REST Power On quiet=${1}
# Check Opal-PRD service enabled on host.
${opal_prd_state}= Is Opal-PRD Service Enabled
Run Keyword If '${opal_prd_state}' == 'disabled'
... Enable Opal-PRD Service On HOST
RAS Suite Cleanup
[Documentation] Perform RAS suite cleanup and verify that host
... boots after test suite run.
# Boot to OS.
REST Power On quiet=${1}
Delete Error Logs
Gard Operations On OS clear all
Inject Error At HOST Boot Path
[Documentation] Inject and verify recoverable error on processor through
... BMC using pdbg tool at HOST Boot path.
... Test sequence:
... 1. Inject error on a given target
... (e.g: Processor core, CAPP, MCA) through BMC using
... pdbg tool at HOST Boot path.
... 2. Check If HOST is rebooted and running.
... 3. Verify error log entry & signature description.
... 4. Verify & clear gard records.
[Arguments] ${fir} ${chip_address} ${signature_desc} ${log_prefix}
# Description of argument(s):
# fir FIR (Fault isolation register) value (e.g. 2011400).
# chip_address Chip address (e.g 2000000000000000).
# signature_desc Error log signature description.
# log_prefix Log path prefix.
Inject Error Through BMC At HOST Boot ${fir} ${chip_address}
Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
Wait for OS
Verify Error Log Entry ${signature_desc} ${log_prefix}
Verify And Clear Gard Records On HOST