Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 1 | *** Settings *** |
| 2 | Documentation This suite tests checkstop operations through HOST. |
| 3 | Resource ../lib/utils.robot |
| 4 | Resource ../lib/openbmc_ffdc.robot |
| 5 | Resource ../lib/ras/host_utils.robot |
| 6 | Resource ../lib/resource.txt |
| 7 | Resource ../lib/state_manager.robot |
| 8 | Resource ../lib/openbmc_ffdc_methods.robot |
| 9 | Resource ../lib/boot_utils.robot |
| 10 | Variables ../lib/ras/variables.py |
| 11 | |
| 12 | Library DateTime |
| 13 | Library OperatingSystem |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 14 | Library random |
| 15 | Library Collections |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 16 | |
| 17 | Suite Setup RAS Suite Setup |
| 18 | Test Setup RAS Test Setup |
| 19 | Test Teardown FFDC On Test Case Fail |
| 20 | Suite Teardown RAS Suite Cleanup |
| 21 | |
| 22 | *** Variables *** |
| 23 | ${stack_mode} normal |
| 24 | |
| 25 | *** Test Cases *** |
| 26 | |
| 27 | # Memory channel (MCACALIFIR) related error injection. |
| 28 | |
| 29 | Verify Recoverable Callout Handling For MCA With Threshold 1 |
| 30 | [Documentation] Verify recoverable callout handling for MCACALIFIR with |
| 31 | ... threshold 1. |
| 32 | [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_1 |
| 33 | |
| 34 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV1 |
| 35 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th1 |
| 36 | Inject Recoverable Error With Threshold Limit Through Host |
| 37 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 38 | |
| 39 | Verify Recoverable Callout Handling For MCA With Threshold 32 |
| 40 | [Documentation] Verify recoverable callout handling for MCACALIFIR with |
| 41 | ... threshold 32. |
| 42 | [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_32 |
| 43 | |
| 44 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV32 |
| 45 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th32 |
| 46 | Inject Recoverable Error With Threshold Limit Through Host |
| 47 | ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path} |
| 48 | |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 49 | Verify Unrecoverable Callout Handling For MCA |
| 50 | [Documentation] Verify unrecoverable callout handling for MCACALIFIR. |
| 51 | [Tags] Verify_Unrecoverable_Callout_Handling_For_MCA |
| 52 | |
| 53 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_UE |
| 54 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir |
| 55 | Inject Unrecoverable Error Through Host |
| 56 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 57 | |
| 58 | # Memory buffer (MCIFIR) related error injection. |
| 59 | |
| 60 | Verify Recoverable Callout Handling For MCI With Threshold 1 |
| 61 | [Documentation] Verify recoverable callout handling for mci with |
| 62 | ... threshold 1. |
| 63 | [Tags] Verify_Recoverable_Callout_Handling_For_MCI_With_Threshold_1 |
| 64 | |
| 65 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_RECV1 |
| 66 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir_th1 |
| 67 | Inject Recoverable Error With Threshold Limit Through Host |
| 68 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 69 | |
| 70 | Verify Unrecoverable Callout Handling For MCI |
| 71 | [Documentation] Verify unrecoverable callout handling for mci. |
| 72 | [Tags] Verify_Unrecoverable_Callout_Handling_For_MCI |
| 73 | |
| 74 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_UE |
| 75 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir |
| 76 | Inject Unrecoverable Error Through Host |
| 77 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 78 | |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 79 | |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 80 | Verify Unrecoverable Callout Handling For NXDMAENG |
| 81 | [Documentation] Verify unrecoverable callout handling for NXDMAENG. |
| 82 | [Tags] Verify_Unrecoverable_Callout_Handling_For_NXDMAENG |
| 83 | |
| 84 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_UE |
| 85 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_ue |
| 86 | Inject Unrecoverable Error Through Host |
| 87 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 88 | |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 89 | # CAPP accelerator (CXAFIR) related error injection. |
| 90 | |
| 91 | Verify Recoverable Callout Handling For CXA With Threshold 5 |
| 92 | [Documentation] Verify recoverable callout handling for CXA with |
| 93 | ... threshold 5. |
| 94 | [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_5 |
| 95 | |
| 96 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV5 |
| 97 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th5 |
| 98 | Inject Recoverable Error With Threshold Limit Through Host |
| 99 | ... ${value[0]} ${value[1]} 5 ${value[2]} ${err_log_path} |
| 100 | |
| 101 | Verify Recoverable Callout Handling For CXA With Threshold 32 |
| 102 | [Documentation] Verify recoverable callout handling for CXA with |
| 103 | ... threshold 32. |
| 104 | [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_32 |
| 105 | |
| 106 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV32 |
| 107 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th32 |
| 108 | Inject Recoverable Error With Threshold Limit Through Host |
| 109 | ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path} |
| 110 | |
| 111 | # OBUSFIR related error injection. |
| 112 | |
| 113 | Verify Recoverable Callout Handling For OBUS With Threshold 32 |
| 114 | [Documentation] Verify recoverable callout handling for OBUS with |
| 115 | ... threshold 32. |
| 116 | [Tags] Verify_Recoverable_Callout_Handling_For_OBUS_With_Threshold_32 |
| 117 | |
| 118 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} OBUS_RECV32 |
| 119 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}obusfir_th32 |
| 120 | Inject Recoverable Error With Threshold Limit Through Host |
| 121 | ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path} |
| 122 | |
| 123 | # Nvidia graphics processing units (NPU0FIR) related error injection. |
| 124 | |
| 125 | Verify Recoverable Callout Handling For NPU0 With Threshold 32 |
| 126 | [Documentation] Verify recoverable callout handling for NPU0 with |
| 127 | ... threshold 32. |
| 128 | [Tags] Verify_Recoverable_Callout_Handling_For_NPU0_With_Threshold_32 |
| 129 | |
| 130 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NPU0_RECV32 |
| 131 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}npu0fir_th32 |
| 132 | Inject Recoverable Error With Threshold Limit Through Host |
| 133 | ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path} |
| 134 | |
Sridevi Ramesh | 6bd6b4c | 2017-10-10 04:38:30 -0500 | [diff] [blame] | 135 | # Nest accelerator NXDMAENGFIR related error injection. |
| 136 | |
| 137 | Verify Recoverable Callout Handling For NXDMAENG With Threshold 1 |
| 138 | [Documentation] Verify recoverable callout handling for NXDMAENG with |
| 139 | ... threshold 1. |
| 140 | [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_1 |
| 141 | |
| 142 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV1 |
| 143 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th1 |
| 144 | Inject Recoverable Error With Threshold Limit Through Host |
| 145 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 146 | |
| 147 | |
| 148 | Verify Recoverable Callout Handling For NXDMAENG With Threshold 32 |
| 149 | [Documentation] Verify recoverable callout handling for NXDMAENG with |
| 150 | ... threshold 32. |
| 151 | [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_32 |
| 152 | |
| 153 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV32 |
| 154 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th32 |
| 155 | Inject Recoverable Error With Threshold Limit Through Host |
| 156 | ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path} |
| 157 | |
| 158 | Verify Unrecoverable Callout Handling For NXDMAENG |
| 159 | [Documentation] Verify unrecoverable callout handling for NXDMAENG. |
| 160 | [Tags] Verify_Unrecoverable_Callout_Handling_For_NXDMAENG |
| 161 | |
| 162 | ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_UE |
| 163 | ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_ue |
| 164 | Inject Unrecoverable Error Through Host |
| 165 | ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path} |
| 166 | |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 167 | *** Keywords *** |
| 168 | |
| 169 | Inject Error Through HOST |
| 170 | [Documentation] Inject checkstop on processor through HOST. |
| 171 | ... Test sequence: |
| 172 | ... 1. Boot To HOST |
| 173 | ... 2. Clear any existing gard records |
| 174 | ... 3. Inject Error on processor/centaur |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 175 | [Arguments] ${fir} ${chip_address} ${threshold_limit} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 176 | # Description of argument(s): |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 177 | # fir FIR (Fault isolation register) value (e.g. 2011400). |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 178 | # chip_address chip address (e.g 2000000000000000). |
| 179 | # threshold_limit Threshold limit (e.g 1, 5, 32). |
| 180 | |
| 181 | Delete Error Logs |
| 182 | Login To OS Host |
| 183 | Gard Operations On OS clear all |
| 184 | |
| 185 | # Fetch processor chip IDs. |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 186 | ${chip_ids}= Get ProcChipId From OS Processor |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 187 | ${proc_ids}= Split String ${chip_ids} |
| 188 | ${proc_id}= Get From List ${proc_ids} 1 |
| 189 | |
| 190 | ${threshold_limit}= Convert To Integer ${threshold_limit} |
| 191 | :FOR ${i} IN RANGE ${threshold_limit} |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 192 | \ Run Keyword Putscom Operations On OS ${proc_id} ${fir} ${chip_address} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 193 | # Adding delay after each error injection. |
Sridevi Ramesh | 6bd6b4c | 2017-10-10 04:38:30 -0500 | [diff] [blame] | 194 | \ Sleep 10s |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 195 | # Adding delay to get error log after error injection. |
Sridevi Ramesh | 6bd6b4c | 2017-10-10 04:38:30 -0500 | [diff] [blame] | 196 | Sleep 120s |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 197 | |
| 198 | Verify And Clear Gard Records On HOST |
| 199 | [Documentation] Verify And Clear gard records on HOST. |
| 200 | |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 201 | ${output}= Gard Operations On OS list |
| 202 | Should Not Contain ${output} 'No GARD entries to display' |
| 203 | Gard Operations On OS clear all |
| 204 | |
| 205 | Verify Error Log Entry |
| 206 | [Documentation] Verify error log entry & signature description. |
| 207 | [Arguments] ${signature_desc} ${log_prefix} |
| 208 | # Description of argument(s): |
| 209 | # signature_desc Error log signature description. |
| 210 | # log_prefix Log path prefix. |
| 211 | |
| 212 | ${resp}= OpenBMC Get Request ${BMC_LOGGING_ENTRY}/list |
| 213 | Should Not Be Equal As Strings ${resp.status_code} ${HTTP_NOT_FOUND} |
| 214 | |
| 215 | Collect eSEL Log ${log_prefix} |
| 216 | ${error_log_file_path}= Catenate ${log_prefix}esel.txt |
| 217 | ${rc} ${output} = Run and Return RC and Output |
Sridevi Ramesh | 6bd6b4c | 2017-10-10 04:38:30 -0500 | [diff] [blame] | 218 | ... grep -i ${signature_desc} ${error_log_file_path} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 219 | Should Not Be Empty ${output} |
| 220 | |
| 221 | Inject Recoverable Error With Threshold Limit Through Host |
| 222 | [Documentation] Inject and verify recoverable error on processor through |
| 223 | ... host. |
| 224 | ... Test sequence: |
| 225 | ... 1. Enable Auto Reboot Setting |
| 226 | ... 2. Inject Error on processor/centaur |
| 227 | ... 3. Check If HOST is running. |
| 228 | ... 4. Verify error log entry & signature description. |
| 229 | ... 4. Verify & clear gard records. |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 230 | [Arguments] ${fir} ${chip_address} ${threshold_limit} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 231 | ... ${signature_desc} ${log_prefix} |
| 232 | # Description of argument(s): |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 233 | # fir FIR (Fault isolation register) value (e.g. 2011400). |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 234 | # chip_address Chip address (e.g 2000000000000000). |
| 235 | # threshold_limit Threshold limit (e.g 1, 5, 32). |
| 236 | # signature_desc Error log signature description. |
| 237 | # log_prefix Log path prefix. |
| 238 | |
| 239 | Set Auto Reboot 1 |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 240 | Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 241 | Is Host Running |
| 242 | ${output}= Gard Operations On OS list |
| 243 | Should Contain ${output} No GARD |
| 244 | Verify Error Log Entry ${signature_desc} ${log_prefix} |
| 245 | |
| 246 | |
| 247 | Inject Unrecoverable Error Through Host |
| 248 | [Documentation] Inject and verify recoverable error on processor through |
| 249 | ... host. |
| 250 | ... Test sequence: |
| 251 | ... 1. Enable Auto Reboot Setting |
| 252 | ... 2. Inject Error on processor/centaur |
| 253 | ... 3. Check If HOST is rebooted. |
| 254 | ... 4. Verify error log entry & signature description. |
| 255 | ... 4. Verify & clear gard records. |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 256 | [Arguments] ${fir} ${chip_address} ${threshold_limit} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 257 | ... ${signature_desc} ${log_prefix} |
| 258 | # Description of argument(s): |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 259 | # fir FIR (Fault isolation register) value (e.g. 2011400). |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 260 | # chip_address Chip address (e.g 2000000000000000). |
| 261 | # threshold_limit Threshold limit (e.g 1, 5, 32). |
| 262 | # signature_desc Error Log signature description. |
| 263 | # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable') |
| 264 | # log_prefix Log path prefix. |
| 265 | |
| 266 | Set Auto Reboot 1 |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 267 | Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 268 | Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted |
| 269 | Wait for OS |
| 270 | Verify And Clear Gard Records On HOST |
| 271 | Verify Error Log Entry ${signature_desc} ${log_prefix} |
| 272 | |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 273 | Fetch FIR Address Translation Value |
| 274 | [Documentation] Fetch FIR address translation value through HOST. |
| 275 | [Arguments] ${proc_chip_id} ${fir} ${target_type} |
| 276 | # Description of argument(s): |
| 277 | # proc_chip_id Processor chip ID (e.g '0', '8'). |
| 278 | # fir FIR (Fault isolation register) value (e.g. 2011400). |
| 279 | # core_id Core ID (e.g. 9). |
| 280 | # target_type Target type (e.g. 'EX', 'EQ', 'C'). |
| 281 | |
| 282 | Login To OS Host |
| 283 | Copy Address Translation Utils To HOST OS |
| 284 | |
| 285 | ${core_ids}= Get Core IDs From OS 0 |
| 286 | # Ignoring master core ID. |
| 287 | ${output}= Get Slice From List ${core_ids} 1 |
| 288 | # Feth random non-master core ID. |
| 289 | ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random |
| 290 | ${core_id}= Get From List ${core_ids_sub_list} 0 |
| 291 | ${translated_fir_addr}= FIR Address Translation Through HOST |
| 292 | ... ${fir} ${core_id} ${target_type} |
| 293 | |
| 294 | [Return] ${translated_fir_addr} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 295 | |
| 296 | RAS Test SetUp |
| 297 | [Documentation] Validates input parameters. |
| 298 | |
| 299 | Should Not Be Empty |
| 300 | ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host. |
| 301 | Should Not Be Empty |
| 302 | ... ${OS_USERNAME} msg=You must provide OS host user name. |
| 303 | Should Not Be Empty |
| 304 | ... ${OS_PASSWORD} msg=You must provide OS host user password. |
| 305 | |
| 306 | # Boot to OS. |
Sridevi Ramesh | 6bd6b4c | 2017-10-10 04:38:30 -0500 | [diff] [blame] | 307 | REST Power On quiet=${1} |
Sridevi Ramesh | 0d88ab3 | 2017-09-21 11:07:28 -0500 | [diff] [blame^] | 308 | # Adding delay to after host bring up. |
| 309 | Sleep 60s |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 310 | |
| 311 | RAS Suite Setup |
| 312 | [Documentation] Create RAS log directory to store all RAS test logs. |
| 313 | |
| 314 | ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/ |
| 315 | Set Suite Variable ${RAS_LOG_DIR_PATH} |
| 316 | Create Directory ${RAS_LOG_DIR_PATH} |
| 317 | OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH} |
| 318 | Empty Directory ${RAS_LOG_DIR_PATH} |
| 319 | |
| 320 | RAS Suite Cleanup |
| 321 | [Documentation] Perform RAS suite cleanup and verify that host |
| 322 | ... boots after test suite run. |
| 323 | |
| 324 | # Boot to OS. |
Sridevi Ramesh | 6bd6b4c | 2017-10-10 04:38:30 -0500 | [diff] [blame] | 325 | REST Power On quiet=${1} |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 326 | Delete Error Logs |
Sridevi Ramesh | b180c9f | 2017-08-06 10:27:41 -0500 | [diff] [blame] | 327 | Gard Operations On OS clear all |