GPU test exerciser
Resolves openbmc/openbmc-test-automation#625
Change-Id: Id005659f512c26cae5342b22a844e912f81f6cd1
Signed-off-by: George Keishing <gkeishin@in.ibm.com>
diff --git a/syslib/utils_os.robot b/syslib/utils_os.robot
index 9e7d36b..a222791 100755
--- a/syslib/utils_os.robot
+++ b/syslib/utils_os.robot
@@ -47,7 +47,7 @@
# TODO: Generalize alias naming using openbmc/openbmc-test-automation#633
Ping Host ${os_host}
- Open Connection ${os_host} alias=${alias_name}
+ SSHLibrary.Open Connection ${os_host} alias=${alias_name}
Login ${os_username} ${os_password}
@@ -168,6 +168,9 @@
Collect NVIDIA Log File
[Documentation] Collect ndivia-smi command output.
+ [Arguments] ${suffix}
+ # Description of argument(s):
+ # suffix String name to append.
# Collects the output of ndivia-smi cmd output.
# TODO: GPU current temperature threshold check.
@@ -206,7 +209,8 @@
${nvidia_out}= Execute Command On BMC nvidia-smi
Write Log Data To File
- ... ${nvidia_out} ${htx_log_dir_path}/${OS_HOST}_${cur_datetime}.nvidia
+ ... ${nvidia_out}
+ ... ${htx_log_dir_path}/${OS_HOST}_${cur_datetime}.nvidia_${suffix}
Pre Test Case Execution
diff --git a/systest/gpu_stress_test.robot b/systest/gpu_stress_test.robot
new file mode 100644
index 0000000..1ca0316
--- /dev/null
+++ b/systest/gpu_stress_test.robot
@@ -0,0 +1,77 @@
+*** Settings ***
+Documentation Stress the system using HTX exerciser.
+
+Resource ../syslib/utils_os.robot
+
+Test Setup Pre Test Case Execution
+Test Teardown Post Test Case Execution
+
+*** Variables ****
+
+${stack_mode} skip
+
+*** Test Cases ***
+
+GPU Stress Test
+ [Documentation] Stress the GPU using HTX exerciser.
+ [Tags] GPU_Stress_Test
+
+ Rprintn
+ Rpvars HTX_DURATION HTX_INTERVAL
+
+ Repeat Keyword ${HTX_LOOP} times Execute GPU Test
+
+
+*** Keywords ***
+
+Execute GPU Test
+ [Documentation] Start HTX exerciser.
+ # Test Flow:
+ # - Power on
+ # - Establish SSH connection session
+ # - Collect GPU nvidia status output
+ # - Create HTX mdt profile
+ # - Run GPU specific HTX exerciser
+ # - Check HTX status for errors
+
+ # Collect data before the test starts.
+ Collect NVIDIA Log File start
+
+ Run Keyword If '${HTX_MDT_PROFILE}' == 'mdt.bu'
+ ... Create Default MDT Profile
+
+ Run MDT Profile
+
+ Loop HTX Health Check
+
+ # Post test loop look out for dmesg error logged.
+ Check For Errors On OS Dmesg Log
+
+ Shutdown HTX Exerciser
+
+ Rprint Timen HTX Test ran for: ${HTX_DURATION}
+
+
+Loop HTX Health Check
+ [Documentation] Run until HTX exerciser fails.
+
+ Repeat Keyword ${HTX_DURATION}
+ ... Run Keywords Check HTX Run Status
+ ... AND Sleep ${HTX_INTERVAL}
+
+
+Post Test Case Execution
+ [Documentation] Do the post test teardown.
+ # 1. Shut down HTX exerciser if test Failed.
+ # 2. Capture FFDC on test failure.
+ # 3. Close all open SSH connections.
+
+ # Keep HTX running if user set HTX_KEEP_RUNNING to 1.
+ Run Keyword If '${TEST_STATUS}' == 'FAIL' and ${HTX_KEEP_RUNNING} == ${0}
+ ... Shutdown HTX Exerciser
+
+ # Collect nvidia-smi output data on exit.
+ Collect NVIDIA Log File end
+
+ FFDC On Test Case Fail
+ Close All Connections