blob: 8f6c0de603773533d7dffc3a09abed5a9a80b132 [file] [log] [blame]
Sridevi Rameshb180c9f2017-08-06 10:27:41 -05001*** Settings ***
2Documentation This suite tests checkstop operations through HOST.
3Resource ../lib/utils.robot
4Resource ../lib/openbmc_ffdc.robot
5Resource ../lib/ras/host_utils.robot
6Resource ../lib/resource.txt
7Resource ../lib/state_manager.robot
8Resource ../lib/openbmc_ffdc_methods.robot
9Resource ../lib/boot_utils.robot
10Variables ../lib/ras/variables.py
11
12Library DateTime
13Library OperatingSystem
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -050014Library random
15Library Collections
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050016
17Suite Setup RAS Suite Setup
18Test Setup RAS Test Setup
19Test Teardown FFDC On Test Case Fail
20Suite Teardown RAS Suite Cleanup
21
22*** Variables ***
23${stack_mode} normal
24
25*** Test Cases ***
26
27# Memory channel (MCACALIFIR) related error injection.
28
29Verify Recoverable Callout Handling For MCA With Threshold 1
30 [Documentation] Verify recoverable callout handling for MCACALIFIR with
31 ... threshold 1.
32 [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_1
33
34 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV1
35 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th1
36 Inject Recoverable Error With Threshold Limit Through Host
37 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
38
39Verify Recoverable Callout Handling For MCA With Threshold 32
40 [Documentation] Verify recoverable callout handling for MCACALIFIR with
41 ... threshold 32.
42 [Tags] Verify_Recoverable_Callout_Handling_For_MCA_With_Threshold_32
43
44 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_RECV32
45 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir_th32
46 Inject Recoverable Error With Threshold Limit Through Host
47 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
48
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050049Verify Unrecoverable Callout Handling For MCA
50 [Documentation] Verify unrecoverable callout handling for MCACALIFIR.
51 [Tags] Verify_Unrecoverable_Callout_Handling_For_MCA
52
53 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCACALIFIR_UE
54 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcacalfir
55 Inject Unrecoverable Error Through Host
56 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
57
58# Memory buffer (MCIFIR) related error injection.
59
60Verify Recoverable Callout Handling For MCI With Threshold 1
61 [Documentation] Verify recoverable callout handling for mci with
62 ... threshold 1.
63 [Tags] Verify_Recoverable_Callout_Handling_For_MCI_With_Threshold_1
64
65 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_RECV1
66 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir_th1
67 Inject Recoverable Error With Threshold Limit Through Host
68 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
69
70Verify Unrecoverable Callout Handling For MCI
71 [Documentation] Verify unrecoverable callout handling for mci.
72 [Tags] Verify_Unrecoverable_Callout_Handling_For_MCI
73
74 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} MCS_UE
75 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}mcifir
76 Inject Unrecoverable Error Through Host
77 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
78
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050079
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -050080Verify Unrecoverable Callout Handling For NXDMAENG
81 [Documentation] Verify unrecoverable callout handling for NXDMAENG.
82 [Tags] Verify_Unrecoverable_Callout_Handling_For_NXDMAENG
83
84 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_UE
85 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_ue
86 Inject Unrecoverable Error Through Host
87 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
88
Sridevi Rameshb180c9f2017-08-06 10:27:41 -050089# CAPP accelerator (CXAFIR) related error injection.
90
91Verify Recoverable Callout Handling For CXA With Threshold 5
92 [Documentation] Verify recoverable callout handling for CXA with
93 ... threshold 5.
94 [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_5
95
96 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV5
97 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th5
98 Inject Recoverable Error With Threshold Limit Through Host
99 ... ${value[0]} ${value[1]} 5 ${value[2]} ${err_log_path}
100
101Verify Recoverable Callout Handling For CXA With Threshold 32
102 [Documentation] Verify recoverable callout handling for CXA with
103 ... threshold 32.
104 [Tags] Verify_Recoverable_Callout_Handling_For_CXA_With_Threshold_32
105
106 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} CXA_RECV32
107 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}cxafir_th32
108 Inject Recoverable Error With Threshold Limit Through Host
109 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
110
111# OBUSFIR related error injection.
112
113Verify Recoverable Callout Handling For OBUS With Threshold 32
114 [Documentation] Verify recoverable callout handling for OBUS with
115 ... threshold 32.
116 [Tags] Verify_Recoverable_Callout_Handling_For_OBUS_With_Threshold_32
117
118 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} OBUS_RECV32
119 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}obusfir_th32
120 Inject Recoverable Error With Threshold Limit Through Host
121 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
122
123# Nvidia graphics processing units (NPU0FIR) related error injection.
124
125Verify Recoverable Callout Handling For NPU0 With Threshold 32
126 [Documentation] Verify recoverable callout handling for NPU0 with
127 ... threshold 32.
128 [Tags] Verify_Recoverable_Callout_Handling_For_NPU0_With_Threshold_32
129
130 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NPU0_RECV32
131 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}npu0fir_th32
132 Inject Recoverable Error With Threshold Limit Through Host
133 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
134
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500135# Nest accelerator NXDMAENGFIR related error injection.
136
137Verify Recoverable Callout Handling For NXDMAENG With Threshold 1
138 [Documentation] Verify recoverable callout handling for NXDMAENG with
139 ... threshold 1.
140 [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_1
141
142 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV1
143 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th1
144 Inject Recoverable Error With Threshold Limit Through Host
145 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
146
147
148Verify Recoverable Callout Handling For NXDMAENG With Threshold 32
149 [Documentation] Verify recoverable callout handling for NXDMAENG with
150 ... threshold 32.
151 [Tags] Verify_Recoverable_Callout_Handling_For_NXDMAENG_With_Threshold_32
152
153 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_RECV32
154 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_th32
155 Inject Recoverable Error With Threshold Limit Through Host
156 ... ${value[0]} ${value[1]} 32 ${value[2]} ${err_log_path}
157
158Verify Unrecoverable Callout Handling For NXDMAENG
159 [Documentation] Verify unrecoverable callout handling for NXDMAENG.
160 [Tags] Verify_Unrecoverable_Callout_Handling_For_NXDMAENG
161
162 ${value}= Get From Dictionary ${ERROR_INJECT_DICT} NX_UE
163 ${err_log_path}= Catenate ${RAS_LOG_DIR_PATH}nxfir_ue
164 Inject Unrecoverable Error Through Host
165 ... ${value[0]} ${value[1]} 1 ${value[2]} ${err_log_path}
166
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500167*** Keywords ***
168
169Inject Error Through HOST
170 [Documentation] Inject checkstop on processor through HOST.
171 ... Test sequence:
172 ... 1. Boot To HOST
173 ... 2. Clear any existing gard records
174 ... 3. Inject Error on processor/centaur
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500175 [Arguments] ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500176 # Description of argument(s):
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500177 # fir FIR (Fault isolation register) value (e.g. 2011400).
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500178 # chip_address chip address (e.g 2000000000000000).
179 # threshold_limit Threshold limit (e.g 1, 5, 32).
180
181 Delete Error Logs
182 Login To OS Host
183 Gard Operations On OS clear all
184
185 # Fetch processor chip IDs.
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500186 ${chip_ids}= Get ProcChipId From OS Processor
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500187 ${proc_ids}= Split String ${chip_ids}
188 ${proc_id}= Get From List ${proc_ids} 1
189
190 ${threshold_limit}= Convert To Integer ${threshold_limit}
191 :FOR ${i} IN RANGE ${threshold_limit}
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500192 \ Run Keyword Putscom Operations On OS ${proc_id} ${fir} ${chip_address}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500193 # Adding delay after each error injection.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500194 \ Sleep 10s
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500195 # Adding delay to get error log after error injection.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500196 Sleep 120s
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500197
198Verify And Clear Gard Records On HOST
199 [Documentation] Verify And Clear gard records on HOST.
200
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500201 ${output}= Gard Operations On OS list
202 Should Not Contain ${output} 'No GARD entries to display'
203 Gard Operations On OS clear all
204
205Verify Error Log Entry
206 [Documentation] Verify error log entry & signature description.
207 [Arguments] ${signature_desc} ${log_prefix}
208 # Description of argument(s):
209 # signature_desc Error log signature description.
210 # log_prefix Log path prefix.
211
212 ${resp}= OpenBMC Get Request ${BMC_LOGGING_ENTRY}/list
213 Should Not Be Equal As Strings ${resp.status_code} ${HTTP_NOT_FOUND}
214
215 Collect eSEL Log ${log_prefix}
216 ${error_log_file_path}= Catenate ${log_prefix}esel.txt
217 ${rc} ${output} = Run and Return RC and Output
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500218 ... grep -i ${signature_desc} ${error_log_file_path}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500219 Should Not Be Empty ${output}
220
221Inject Recoverable Error With Threshold Limit Through Host
222 [Documentation] Inject and verify recoverable error on processor through
223 ... host.
224 ... Test sequence:
225 ... 1. Enable Auto Reboot Setting
226 ... 2. Inject Error on processor/centaur
227 ... 3. Check If HOST is running.
228 ... 4. Verify error log entry & signature description.
229 ... 4. Verify & clear gard records.
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500230 [Arguments] ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500231 ... ${signature_desc} ${log_prefix}
232 # Description of argument(s):
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500233 # fir FIR (Fault isolation register) value (e.g. 2011400).
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500234 # chip_address Chip address (e.g 2000000000000000).
235 # threshold_limit Threshold limit (e.g 1, 5, 32).
236 # signature_desc Error log signature description.
237 # log_prefix Log path prefix.
238
239 Set Auto Reboot 1
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500240 Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500241 Is Host Running
242 ${output}= Gard Operations On OS list
243 Should Contain ${output} No GARD
244 Verify Error Log Entry ${signature_desc} ${log_prefix}
245
246
247Inject Unrecoverable Error Through Host
248 [Documentation] Inject and verify recoverable error on processor through
249 ... host.
250 ... Test sequence:
251 ... 1. Enable Auto Reboot Setting
252 ... 2. Inject Error on processor/centaur
253 ... 3. Check If HOST is rebooted.
254 ... 4. Verify error log entry & signature description.
255 ... 4. Verify & clear gard records.
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500256 [Arguments] ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500257 ... ${signature_desc} ${log_prefix}
258 # Description of argument(s):
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500259 # fir FIR (Fault isolation register) value (e.g. 2011400).
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500260 # chip_address Chip address (e.g 2000000000000000).
261 # threshold_limit Threshold limit (e.g 1, 5, 32).
262 # signature_desc Error Log signature description.
263 # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable')
264 # log_prefix Log path prefix.
265
266 Set Auto Reboot 1
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500267 Inject Error Through HOST ${fir} ${chip_address} ${threshold_limit}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500268 Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted
269 Wait for OS
270 Verify And Clear Gard Records On HOST
271 Verify Error Log Entry ${signature_desc} ${log_prefix}
272
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500273Fetch FIR Address Translation Value
274 [Documentation] Fetch FIR address translation value through HOST.
275 [Arguments] ${proc_chip_id} ${fir} ${target_type}
276 # Description of argument(s):
277 # proc_chip_id Processor chip ID (e.g '0', '8').
278 # fir FIR (Fault isolation register) value (e.g. 2011400).
279 # core_id Core ID (e.g. 9).
280 # target_type Target type (e.g. 'EX', 'EQ', 'C').
281
282 Login To OS Host
283 Copy Address Translation Utils To HOST OS
284
285 ${core_ids}= Get Core IDs From OS 0
286 # Ignoring master core ID.
287 ${output}= Get Slice From List ${core_ids} 1
288 # Feth random non-master core ID.
289 ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random
290 ${core_id}= Get From List ${core_ids_sub_list} 0
291 ${translated_fir_addr}= FIR Address Translation Through HOST
292 ... ${fir} ${core_id} ${target_type}
293
294 [Return] ${translated_fir_addr}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500295
296RAS Test SetUp
297 [Documentation] Validates input parameters.
298
299 Should Not Be Empty
300 ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host.
301 Should Not Be Empty
302 ... ${OS_USERNAME} msg=You must provide OS host user name.
303 Should Not Be Empty
304 ... ${OS_PASSWORD} msg=You must provide OS host user password.
305
306 # Boot to OS.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500307 REST Power On quiet=${1}
Sridevi Ramesh0d88ab32017-09-21 11:07:28 -0500308 # Adding delay to after host bring up.
309 Sleep 60s
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500310
311RAS Suite Setup
312 [Documentation] Create RAS log directory to store all RAS test logs.
313
314 ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/
315 Set Suite Variable ${RAS_LOG_DIR_PATH}
316 Create Directory ${RAS_LOG_DIR_PATH}
317 OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH}
318 Empty Directory ${RAS_LOG_DIR_PATH}
319
320RAS Suite Cleanup
321 [Documentation] Perform RAS suite cleanup and verify that host
322 ... boots after test suite run.
323
324 # Boot to OS.
Sridevi Ramesh6bd6b4c2017-10-10 04:38:30 -0500325 REST Power On quiet=${1}
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500326 Delete Error Logs
Sridevi Rameshb180c9f2017-08-06 10:27:41 -0500327 Gard Operations On OS clear all