blob: 4661e24853711e29f3b3daf0de569d59b434834d [file] [log] [blame]
Sridevi Rameshb180c9f2017-08-06 10:27:41 -05001*** Settings ***
2Documentation This suite tests checkstop operations through HOST.
3Resource ../lib/utils.robot
4Resource ../lib/openbmc_ffdc.robot
5Resource ../lib/ras/host_utils.robot
6Resource ../lib/resource.txt
7Resource ../lib/state_manager.robot
8Resource ../lib/openbmc_ffdc_methods.robot
9Resource ../lib/boot_utils.robot
10Variables ../lib/ras/variables.py
11
12Library DateTime
13Library OperatingSystem
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -050014Library random
15Library Collections
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050016
17Suite Setup RAS Suite Setup
18Test Setup RAS Test Setup
19Test Teardown FFDC On Test Case Fail
20Suite Teardown RAS Suite Cleanup
21
22*** Variables ***
23${stack_mode} normal
24
25*** Test Cases ***
26
27# Memory channel (MCACALIFIR) related error injection.
28
29Verify Recoverable Callout Handling For MCA With Threshold 1
30 [Documentation] Verify recoverable callout handling for MCACALIFIR with
31 ... threshold 1.
32 [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_1
33
34 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV1
35 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th1
36 Inject Recoverable Error With Threshold Limit Through Host
37 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
38
39Verify Recoverable Callout Handling For MCA With Threshold 32
40 [Documentation] Verify recoverable callout handling for MCACALIFIR with
41 ... threshold 32.
42 [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_32
43
44 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV32
45 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th32
46 Inject Recoverable Error With Threshold Limit Through Host
47 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
48
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050049Verify Unrecoverable Callout Handling For MCA
50 [Documentation] Verify unrecoverable callout handling for MCACALIFIR.
51 [Tags] Verify_Unrecoverable_Callout_Handling_For_MCA
52
53 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_UE
54 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir
55 Inject Unrecoverable Error Through Host
56 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
57
58# Memory buffer (MCIFIR) related error injection.
59
60Verify Recoverable Callout Handling For MCI With Threshold 1
61 [Documentation] Verify recoverable callout handling for mci with
62 ... threshold 1.
63 [Tags] Verify_Recoverable_Callout_Handling_For_MCI_With_Threshold_1
64
65 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_RECV1
66 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir_th1
67 Inject Recoverable Error With Threshold Limit Through Host
68 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
69
70Verify Unrecoverable Callout Handling For MCI
71 [Documentation] Verify unrecoverable callout handling for mci.
72 [Tags] Verify_Unrecoverable_Callout_Handling_For_MCI
73
74 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_UE
75 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir
76 Inject Unrecoverable Error Through Host
77 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
78
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050079# CAPP accelerator (CXAFIR) related error injection.
80
81Verify Recoverable Callout Handling For CXA With Threshold 5
82 [Documentation] Verify recoverable callout handling for CXA with
83 ... threshold 5.
84 [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_5
85
86 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV5
87 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th5
88 Inject Recoverable Error With Threshold Limit Through Host
89 ... ${value[0]} ${value[1]} 5 ${value[2]} ${err_log_path}
90
91Verify Recoverable Callout Handling For CXA With Threshold 32
92 [Documentation] Verify recoverable callout handling for CXA with
93 ... threshold 32.
94 [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_32
95
96 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV32
97 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th32
98 Inject Recoverable Error With Threshold Limit Through Host
99 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
100
101# OBUSFIR related error injection.
102
103Verify Recoverable Callout Handling For OBUS With Threshold 32
104 [Documentation] Verify recoverable callout handling for OBUS with
105 ... threshold 32.
106 [Tags] Verify_Recoverable_Callout_Handling_For_OBUS_With_Threshold_32
107
108 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} OBUS_RECV32
109 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}obusfir_th32
110 Inject Recoverable Error With Threshold Limit Through Host
111 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
112
113# Nvidia graphics processing units (NPU0FIR) related error injection.
114
115Verify Recoverable Callout Handling For NPU0 With Threshold 32
116 [Documentation] Verify recoverable callout handling for NPU0 with
117 ... threshold 32.
118 [Tags] Verify_Recoverable_Callout_Handling_For_NPU0_With_Threshold_32
119
120 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NPU0_RECV32
121 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}npu0fir_th32
122 Inject Recoverable Error With Threshold Limit Through Host
123 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
124
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500125# Nest accelerator NXDMAENGFIR related error injection.
126
127Verify Recoverable Callout Handling For NXDMAENG With Threshold 1
128 [Documentation] Verify recoverable callout handling for NXDMAENG with
129 ... threshold 1.
130 [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_1
131
132 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV1
133 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th1
134 Inject Recoverable Error With Threshold Limit Through Host
135 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
136
137
138Verify Recoverable Callout Handling For NXDMAENG With Threshold 32
139 [Documentation] Verify recoverable callout handling for NXDMAENG with
140 ... threshold 32.
141 [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_32
142
143 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV32
144 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th32
145 Inject Recoverable Error With Threshold Limit Through Host
146 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
147
148Verify Unrecoverable Callout Handling For NXDMAENG
149 [Documentation] Verify unrecoverable callout handling for NXDMAENG.
150 [Tags] Verify_Unrecoverable_Callout_Handling_For_NXDMAENG
151
152 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_UE
153 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_ue
154 Inject Unrecoverable Error Through Host
155 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
156
Sridevi Ramesh151fcf02017-10-24 02:08:27 -0500157
158# L2FIR related error injection.
159
160Verify Recoverable Callout Handling For L2FIR With Threshold 1
161 [Documentation] Verify recoverable callout handling for L2FIR with
162 ... threshold 1.
163 [Tags] Verify_Recoverable_Callout_Handling_For_L2FIR_With_Threshold_1
164
165 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} L2FIR_RECV1
166 ${translated_fir}= Fetch FIR Address Translation Value 0 ${value[0]} EX
167 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}l2fir_th1
168 Inject Recoverable Error With Threshold Limit Through Host
169 ... ${translated_fir} ${value[1]} 1 ${value[2]} ${err_log_path}
170
171# L3FIR related error injection.
172
173Verify Recoverable Callout Handling For L3FIR With Threshold 1
174 [Documentation] Verify recoverable callout handling for L3FIR with
175 ... threshold 1.
176 [Tags] Verify_Recoverable_Callout_Handling_For_L3FIR_With_Threshold_1
177
178 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} L3FIR_RECV1
179 ${translated_fir}= Fetch FIR Address Translation Value 0 ${value[0]} EX
180 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}l3fir_th1
181 Inject Recoverable Error With Threshold Limit Through Host
182 ... ${translated_fir} ${value[1]} 1 ${value[2]} ${err_log_path}
183
184Verify Recoverable Callout Handling For L3FIR With Threshold 32
185 [Documentation] Verify recoverable callout handling for L3FIR with
186 ... threshold 32.
187 [Tags] Verify_Recoverable_Callout_Handling_For_L3FIR_With_Threshold_32
188
189 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} L3FIR_RECV32
190 ${translated_fir}= Fetch FIR Address Translation Value 0 ${value[0]} EX
191 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}l3fir_th32
192 Inject Recoverable Error With Threshold Limit Through Host
193 ... ${translated_fir} ${value[1]} 32 ${value[2]} ${err_log_path}
194
195# On chip controller (OCCFIR) related error injection.
196
197Verify Recoverable Callout Handling For OCC With Threshold 1
198 [Documentation] Verify recoverable callout handling for OCCFIR with
199 ... threshold 1.
200 [Tags] Verify_Recoverable_Callout_Handling_For_OCC_With_Threshold_1
201
202 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} OCCFIR_RECV1
203 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}occfir_th1
204 Inject Recoverable Error With Threshold Limit Through Host
205 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
206
207# Core management engine (CMEFIR) related error injection.
208
209Verify Recoverable Callout Handling For CMEFIR With Threshold 1
210 [Documentation] Verify recoverable callout handling for CMEFIR with
211 ... threshold 1.
212 [Tags] Verify_Recoverable_Callout_Handling_For_CMEFIR_With_Threshold_1
213
214 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CMEFIR_RECV1
215 ${translated_fir}= Fetch FIR Address Translation Value 0 ${value[0]} EX
216 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cmefir_th1
217 Inject Recoverable Error With Threshold Limit Through Host
218 ... ${translated_fir} ${value[1]} 1 ${value[2]} ${err_log_path}
219
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500220*** Keywords ***
221
222Inject Error Through HOST
223 [Documentation] Inject checkstop on processor through HOST.
224 ... Test sequence:
225 ... 1. Boot To HOST
226 ... 2. Clear any existing gard records
227 ... 3. Inject Error on processor/centaur
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500228 [Arguments] ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500229 # Description of argument(s):
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500230 # fir FIR (Fault isolation register) value (e.g. 2011400).
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500231 # chip_address chip address (e.g 2000000000000000).
232 # threshold_limit Threshold limit (e.g 1, 5, 32).
233
234 Delete Error Logs
235 Login To OS Host
236 Gard Operations On OS clear all
237
238 # Fetch processor chip IDs.
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500239 ${chip_ids}= Get ProcChipId From OS Processor
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500240 ${proc_ids}= Split String ${chip_ids}
241 ${proc_id}= Get From List ${proc_ids} 1
242
243 ${threshold_limit}= Convert To Integer ${threshold_limit}
244 :FOR ${i} IN RANGE ${threshold_limit}
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500245 \ Run Keyword Putscom Operations On OS ${proc_id} ${fir} ${chip_address}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500246 # Adding delay after each error injection.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500247 \ Sleep 10s
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500248 # Adding delay to get error log after error injection.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500249 Sleep 120s
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500250
251Verify And Clear Gard Records On HOST
252 [Documentation] Verify And Clear gard records on HOST.
253
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500254 ${output}= Gard Operations On OS list
255 Should Not Contain ${output} 'No GARD entries to display'
256 Gard Operations On OS clear all
257
258Verify Error Log Entry
259 [Documentation] Verify error log entry & signature description.
260 [Arguments] ${signature_desc} ${log_prefix}
261 # Description of argument(s):
262 # signature_desc Error log signature description.
263 # log_prefix Log path prefix.
264
265 ${resp}= OpenBMC Get Request ${BMC_LOGGING_ENTRY}/list
266 Should Not Be Equal As Strings ${resp.status_code} ${HTTP_NOT_FOUND}
267
268 Collect eSEL Log ${log_prefix}
269 ${error_log_file_path}= Catenate ${log_prefix}esel.txt
270 ${rc} ${output} = Run and Return RC and Output
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500271 ... grep -i ${signature_desc} ${error_log_file_path}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500272 Should Not Be Empty ${output}
273
274Inject Recoverable Error With Threshold Limit Through Host
275 [Documentation] Inject and verify recoverable error on processor through
276 ... host.
277 ... Test sequence:
278 ... 1. Enable Auto Reboot Setting
279 ... 2. Inject Error on processor/centaur
280 ... 3. Check If HOST is running.
281 ... 4. Verify error log entry & signature description.
282 ... 4. Verify & clear gard records.
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500283 [Arguments] ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500284 ... ${signature_desc} ${log_prefix}
285 # Description of argument(s):
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500286 # fir FIR (Fault isolation register) value (e.g. 2011400).
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500287 # chip_address Chip address (e.g 2000000000000000).
288 # threshold_limit Threshold limit (e.g 1, 5, 32).
289 # signature_desc Error log signature description.
290 # log_prefix Log path prefix.
291
292 Set Auto Reboot 1
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500293 Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500294 Is Host Running
295 ${output}= Gard Operations On OS list
296 Should Contain ${output} No GARD
297 Verify Error Log Entry ${signature_desc} ${log_prefix}
298
299
300Inject Unrecoverable Error Through Host
301 [Documentation] Inject and verify recoverable error on processor through
302 ... host.
303 ... Test sequence:
304 ... 1. Enable Auto Reboot Setting
305 ... 2. Inject Error on processor/centaur
306 ... 3. Check If HOST is rebooted.
307 ... 4. Verify error log entry & signature description.
308 ... 4. Verify & clear gard records.
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500309 [Arguments] ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500310 ... ${signature_desc} ${log_prefix}
311 # Description of argument(s):
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500312 # fir FIR (Fault isolation register) value (e.g. 2011400).
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500313 # chip_address Chip address (e.g 2000000000000000).
314 # threshold_limit Threshold limit (e.g 1, 5, 32).
315 # signature_desc Error Log signature description.
316 # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable')
317 # log_prefix Log path prefix.
318
319 Set Auto Reboot 1
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500320 Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500321 Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
322 Wait for OS
323 Verify And Clear Gard Records On HOST
324 Verify Error Log Entry ${signature_desc} ${log_prefix}
325
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500326Fetch FIR Address Translation Value
327 [Documentation] Fetch FIR address translation value through HOST.
328 [Arguments] ${proc_chip_id} ${fir} ${target_type}
329 # Description of argument(s):
330 # proc_chip_id Processor chip ID (e.g '0', '8').
331 # fir FIR (Fault isolation register) value (e.g. 2011400).
332 # core_id Core ID (e.g. 9).
333 # target_type Target type (e.g. 'EX', 'EQ', 'C').
334
335 Login To OS Host
336 Copy Address Translation Utils To HOST OS
337
338 ${core_ids}= Get Core IDs From OS 0
339 # Ignoring master core ID.
340 ${output}= Get Slice From List ${core_ids} 1
341 # Feth random non-master core ID.
342 ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random
343 ${core_id}= Get From List ${core_ids_sub_list} 0
344 ${translated_fir_addr}= FIR Address Translation Through HOST
345 ... ${fir} ${core_id} ${target_type}
346
347 [Return] ${translated_fir_addr}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500348
349RAS Test SetUp
350 [Documentation] Validates input parameters.
351
352 Should Not Be Empty
353 ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host.
354 Should Not Be Empty
355 ... ${OS_USERNAME} msg=You must provide OS host user name.
356 Should Not Be Empty
357 ... ${OS_PASSWORD} msg=You must provide OS host user password.
358
359 # Boot to OS.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500360 REST Power On quiet=${1}
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500361 # Adding delay to after host bring up.
362 Sleep 60s
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500363
364RAS Suite Setup
365 [Documentation] Create RAS log directory to store all RAS test logs.
366
367 ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/
368 Set Suite Variable ${RAS_LOG_DIR_PATH}
369 Create Directory ${RAS_LOG_DIR_PATH}
370 OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH}
371 Empty Directory ${RAS_LOG_DIR_PATH}
372
373RAS Suite Cleanup
374 [Documentation] Perform RAS suite cleanup and verify that host
375 ... boots after test suite run.
376
377 # Boot to OS.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500378 REST Power On quiet=${1}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500379 Delete Error Logs
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500380 Gard Operations On OS clear all