注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

欢迎光临我的博客

 
 
 

日志

 
 

Storage Processor Qualifier   

2008-08-04 18:00:10|  分类: 存储 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Storage Processor Qualifier

(SPQ) User Guide

 

 

Revision History

 

Date

Version

Updated By

Comment

September 08, 2006

0.1

Vaijayanti Joshi

Draft

September 21, 2006

1.0

Vaijayanti Joshi

First Release

 

October 3, 2006

1.1

Marc St.Laurent

Clarifications

March 15, 2007

2.0

Nilesh Samant

Enhancements for V2.0

July 27, 2007

2.2

Jack Rourke

Enhancements for V2.2

November 15, 2007

2.3

Jack Rourke

Syntax edits


Table of Contents. 2

1.     Introduction. 3

1.1.      What is SPQ?. 3

1.2.      How does SPQ work?. 3

2.     Installation. 3

3.     SPQ Pre-requisites. 6

3.1.      Software Requirements. 6

3.2.      Failover Requirements. 6

3.3.      Hardware Requirements. 6

3.4.      Connection Requirements. 6

3.5.      Ktcons requirements. 6

4.     How does SPQ Work?. 7

5.     Detailed description SPQ interface. 8

5.1.      Connection type: 9

5.2.      IP Address: 16

5.3.      Start Pushbutton: 16

5.4.      Stop Pushbutton: 17

5.5.      Report Pushbutton: 17

5.6.      SP Qualifier Progress status: 17

5.7.      Hyper Terminal Output: 18

6.     SPQ detects following SP Issues/Problems: 18

6.1.      Unpingable SP and not servicing I/O.. 18

6.2.      Pingable SP but not servicing I/O –. 19

6.2.1.       SPQ detects whether an SP is in a degraded mode. 19

6.2.2.       SPQ detects whether an SP is in a HFOFF mode. 19

6.2.3.       SPQ detects whether there are any backend issues. 20

6.2.4.       SPQ detects whether a controlled shutdown of the SP was initiated. 20

6.2.5.       SPQ detects whether the FE ports are enabled or not. 20

6.3.      Partially pingable SP. 21

6.4.      Unpingable SP but servicing I/O.. 22

6.5.      Pingable SP, servicing I/O but Navi agent unmanaged. 22

6.6.      ECC errors, Machine Check Exceptions and bugchecks on the SP. 22

6.7.      Single SP Operations. 23

 


SPQSustaining@emc.com

 

1.2.                    How does SPQ work?

?         Detects whether an SP is degraded, hung, dead or experienced a reboot/bugcheck and reports I/O state

?         Determines whether a hardware problem or software problem is keeping the SP from functioning normally

?         In case of an SP replacement, SPQ prompts service personnel to return the replaced SP, SPQ Report and SPcollects to manufacturing for FA (Failure Analysis)

?         Support personnel submit SPQ reports to next support level if the problem persists

 

2.   Installation

 

SPQ is packaged in an Installable Kit for a laptop or management station.

 

Unzip the file and double click on the executable SPQ_setup.exe. The following screen will appear, click on next. However, if Navicli is not installed on the system, setup will fail, prompting the user to install Navicli.

 

Figure 1

The setup will ask for installation path.

Figure 2

It is recommended to install SPQ under default path unless there are some access restrictions to C: or some security issue to install at default path.

 

To begin installation, click Install in the following screen.

Figure 3

SPQ will be automatically installed in the selected folder.

 

An SPQ Icon is created by default on the desktop.

 

Figure 4

This figure shows installation progress, let this process complete before proceeding.

Figure 5

3.    SPQ Pre-requisites

3.1.            Software Requirements: A compatible revision of Navicli is installed on the service laptop or management station.

3.2.            Failover Requirements: SPQ assumes that Failover is properly configured and there are no problems with HA. If not, SPQ will prompt the user to trespass the LUNS to the Peer SP prior to further diagnosis. Also, SPQ assumes that Navicli is installed on CE's laptop.

3.3.            Hardware Requirements: Serial Cable(s) must be connected to the SP for a full diagnosis.  SPQ performs a major part of its analysis remotely from the management station. However, under certain situations SPQ needs to be executed from a CE's laptop using serial connectivity.  For this reason and the possibility of an SP requiring a reseat, a CE must be present to execute a complete SPQ test sequence.

3.4.            Connection Requirements: While SPQ is running, please do not try to initiate a PPP connection manually. SPQ does it on its own.

3.5.            Ktcons requirements: While SPQ is checking an SP, do not start a remote Ktcons session for that SP from any machine. SP does not cater to more than one Ktcons session at a time.  SPQ may hang if a Ktcons session is already established.

 

4.   How does SPQ Work?

 

SPQ gathers the following statistics about both of the SPs:

  1. Is an SP consistently pingable for a certain period of time?
  2. Is the Navicli Getagent command returning valid SP information?
  3. Is the SP capable of servicing I/O?

SPQ relies upon the following existing array software to determine the SP state:

  1. Ktcons to capture remote Ktraces.
  2. Navicli commands to perform the analysis from a management station.
  3. SPQ captures the SP output during boot time using serial connectivity to the SP. SPQ has been implemented with a PPP connection to the SP to retrieve Navicli information when a CE is not allowed to access a customer network.
  4. SPQ uses documented POST errors to detect hardware issues with the SP.


 

 

5.   Detailed description SPQ interface

 

When the SPQ is launched, it will show following login screen:

 

Figure 6

 

The User login is  “clariion” and the Password is “clariion!” to enter the main screen.

 

 

 

 

 

The main screen of the application looks like this:

 

Figure 7

 

5.1.                    Connection type:

On the main screen user can select the connection type between the array and the management station. The ‘Select available connection type’ dropdown box has five options:

 

1.         LAN: Select this type for a network connection to the SPs. In this case, SPQ will do remote analysis. However, if further diagnosis needs to be performed by obtaining Hperterm output, SPQ will suggest choosing a different type of connectivity, either LAN & Serial or Serial.

 

 

2.         LAN & Serial: Select this type when both network connectivity and serial connectivity available to the SPs from the same machine. The following screen shows that both the SPs are not pingable.  In this case, SPQ proceeds to capture the SP boot sequence on COM1

 

3.         Serial Connection (Either to SPA or SPB): Select this type when only serially connected to the SPs. In this mode SPQ will capture the Hperterm output of SP boot sequence to determine the faulty component.

SPQ will make use of the PPP connection to capture remote Ktraces at the same time.

When a serial option is chosen, the following screen will be displayed asking a PPP connection be established if none is present.

 

Following screen will be displayed:

 

Figure 8

Select the COM port from where your system is going to be serially connected to the SP and then click on the verify button.

 

The following screen of new connection wizard will be displayed:

Figure 9

Select the option “Connect directly to another computer.”

The next screen will be displayed to confirm the selected connection port:

 

Figure 10

 

Choose the port from which you are going to connect the serial cable to the SP.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The next screen will be displayed to enter the name of the connection. Keep the default connection name as “SPQ”:

Figure 11

The next screen of SPQ properties will be displayed, click on ‘configure’.

 

Figure 12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In the next screen, choose the baud rate of 11520 and click OK.

 

Figure 13

 

Now, SPQ is configured to use the PPP and Serial connection.

 

4. SP Collects: Select this connection type when there is no array connectivity possible, but SP Collects are available. Select SP Collects and click the Start pushbutton. You will be prompted to supply the location of collect .zip files. Navigate to the folder where the collect .zip files are located and select both the zip file. If collects for both SPs are not available then select the available zip file (you must select the files that end with “_data.zip”, e.g. APM00051800822_SPA_2006-09-19_15-13-17_14d0d7_data.zip), then click OK. SP Qualifier will unzip these files if necessary, read available information from the files, and produce a report.

Note: SP Qualifier report and log files are automatically added to the SP Collect zip file when the test is complete. Look for a .zip file named with the Array serial number, e.g. APM00051800822_SPQualifier_SP_2006-09-19_15-13-17_14d0d7.zip.

5.2.                    IP Address:

For LAN and LAN & Serial connectivity, input the IP addresses of the SPs. To qualify a single SP, enter just one IP address.

 

For Serial connectivity, both IP address controls will be disabled.

 

5.3.                    Start Pushbutton:

After selecting the connectivity type and filling in the IP addresses, click on the start push-button. This will trigger SPQ to start analyzing the status of SPs. It takes between 8-10 minutes in some cases for SPQ to complete qualification.

5.4.                    Stop Pushbutton:

Selecting stop at any point will terminate SPQ analysis.

5.5.                    Report Pushbutton:

SPQ generates a report after its analysis is complete. This report is displayed in a pop-up window as soon as analysis is complete.

 

Specify the report be saved to disk by selecting the report button. This will invoke a save dialog box.

 

The report file name is generated as ArraySerialNumber_SPQReport_date_time.txt.

An example is APM00031201788_SPQReport_2006-8-29_22-22-31.txt.

 

Note: when SP Qualifier is run against a live array, an additional pushbutton with the caption “Add Report to Collects” is displayed. Click this button if you want to add the SQ Qualifier report and log files to existing collect files for the array.

Sample Report:

Figure 14

5.6.                    SP Qualifier Progress status:

SPQ will keep on displaying run-time progress of its analysis in this progress window.

 

5.7.                    Hyper Terminal Output:

When the SP is stuck while booting, SPQ will capture the boot sequence of the SP and will display it in a Hyper Terminal Output window.  This will require serial connectivity to the SP.

Before starting to capture the Hyper Terminal output, SPQ will prompt the user with the following message box to make sure that the SP is not servicing I/O. We do not want the SP to get reseated if it is already servicing I/O.  If Yes is selected, SPQ will terminate the application.

Figure 15

 

SPQ expects the user to click on OK and then reseat the SP.  It will then start capturing the Hperterm output.

Figure 16

6.   SPQ detects following SP Issues/Problems:

6.1.                    Unpingable SP and not servicing I/O

Diagnosis is done by capturing the boot sequence of SP on the serial port.

·        The boot sequence determines if the SP is stuck

·        It captures the specific error code

·        SPQ suggests whether to replace SP or not depending on the error code

·        If neither SPs are pingable then, the following prompt will be shown to make sure that there are no network issues on the customer site.

Figure 17

6.2.                    Pingable SP but not servicing I/O –

Diagnosis is done by capturing remote Ktraces.

·        SPQ suggests NOT to replace the SP in most these situations and provides additional information on the type of the error.

 

6.2.1.   SPQ detects whether an SP is in a degraded mode.

The screen of SPQ output in this case looks like what is displayed below:

Indication the SP is in a Degraded mode.

Figure 18

 

 

6.2.2.   SPQ detects whether an SP is in a HFOFF mode.

The screen of SPQ output in this case looks like what is displayed below:

Indicates that the SP is in a HFOFF mode.

Array serial number, SP serial numbers and Flare revision.

Figure 19

6.2.3.   SPQ detects whether there are any backend issues

Examples; enclosures shunted, enclosures failed, enclosure removed, rebuild problems on NTMirror drives etc.

6.2.4.   SPQ detects whether a controlled shutdown of the SP was initiated.

6.2.5.   SPQ detects whether the FE ports are enabled or not.

 

 

 

 

 

 

6.3.                    Partially pingable SP

Diagnosis is done by capturing remote Ktraces of SP.

·        Detects whether an SP is in a reboot loop due to recursive bugchecks.

·        SPQ may suggest NOT to replacing the SP and will provide information on bugchecks.

The exact screen of the SPQ output is as shown below. Note that it indicates that SPA is in a reboot loop. SPQ has to handle two situations.  The first, where the SP bugchecks four times and then goes into degraded mode. The other situation is when an SP is in a continuous rolling reboot. To differentiate between these two situations, SPQ will capture four reboots of the SP and then will make the decision accordingly.  This is the most time consuming scenario.  Accordingly SPQ also needs more time to analyze this operation and provide its results.  SPQ may take up to 10 minutes to complete this analysis – so please be PATIENT!

 

Indication that the SP is in reboot loop.

Figure 20

 

 

 

6.4.                    Unpingable SP but servicing I/O

Diagnosis is done by executing Navicli commands from the peer SP.

·        Detects whether the LAN port/cable is bad

·        SPQ suggests NOT to replace SP in these situations and provides more information on the corrective action

6.5.                    Pingable SP, servicing I/O but Navi agent unmanaged

·        SPQ suggests NOT to replace SP in this situation and provides more information regarding the corrective action

6.6.                    ECC errors, Machine Check Exceptions and bugchecks on the SP.

In the situations when both the SPs Pingable, managed and servicing I/O, SPQ detects whether the SP has experienced ECC errors or bugchecks.

Diagnosis is done by executing Navicli commands for each SP.

·        Detects whether either of the SP bugchecked in last 7 days

·        Detects whether there were any ECC errors on the SP. Suggest replacement for multi-bit ECC errors. In the case of single bit ECC errors, SPQ suggest to monitor the SP for 50 such errors in a week’s timeframe.

·        SPQ suggests NOT replacing SP for bugchecks and provides information on the bugcheck. The only exception is a machine check exception bugcheck (bugcheck # 9c). If SP has experienced this bugcheck, SPQ will suggest replacement.

The screen of SPQ output is displayed below:

Look for the bugcheck here.

            

Figure 21

 

 

6.7.                    Single SP Operations.

SPQ can be used to analyze the state of a single SP as well. Input a Single SP IP. SPQ then assumes the second SP as a dummy SP. If the Single SP is not pingable at all, then as in case of a Dual SP scenario, SPQ will first prompt the user to check for customer network issues.

  评论这张
 
阅读(483)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017