Transport or move the physical drives of a disk array to different slots in the same VTrak E-Class enclosure, or from one VTrak enclosure to another

Topic Is Locked
3.1K Views
Last Post 25 March 2008

Ken Chou posted this 25 March 2008

Transport Procedure for VTrak E-Class

Version 1.0 - January 18th, 2008

Transport is the action of moving the physical drives of a disk array:

To different slots in the same VTrak enclosure
From one VTrak enclosure to another

Promise recommends that you to use the CLI—not WebPAM PRO—to perform a Transport.

Step 1: Preparing the Disk Array for Transport

Before you can use the Transport feature, you must verify normal operation of the:

Disk array
Logical drives
Source RAID controllers
Target RAID controllers
JBOD IO modules

Connecting to the RAID Head

Connect to the Source RAID Head controller via Telnet or HyperTerminal.
Verify that the source array’s Operational Status is OK.
At the command line, type array –v and press Enter.
Verify that the logical drive’s Operational Status is OK.
At the command line, type logdrv –v and press Enter.
Verify that no background activities are running on the disk array or logical drive.
At the command line, type bga and press Enter.
If you are moving the disk array to a different enclosure, verify that the RAID controllers in the target enclosure are:
Running firmware 3.28.0000.00 or the prescribed firmware build (3.29.0000.00 for Apple users that purchased product from the Apple Store)

Running in Active/Active mode
Operational Status OK
Not reporting any link errors

At the command line, type ctrl -v and press Enter.

Connecting to JBOD IO Modules

Connect to each JBOD IO module via the RJ11 console.
Verify that each JBOD IO module is running SEP firmware 1.07.0000.03 or newer (1.07.0000.04 for Apple users that purchased product from the Apple Store).
At the command line, type enclosure and press Enter.
Verify that each JBOD IO module is free from link errors.
Each JBOD enclosure has two IO modules.
At the command line, type link and press Enter.

See the example below of a link counter output free of link errors:

cli:> link
Link Status:
	Port	Type	Rate	Init	Dev	Link	PRdy
P 0	D01	SATA	3.0G	OK	End	----	Rdy
P 1	D02	SATA	3.0G	OK	End	----	Rdy
P 2	D03	SATA	3.0G	OK	End	----	Rdy
P 3	D04	SATA	3.0G	OK	End	----	Rdy
P 4	D05	SATA	3.0G	OK	End	----	Rdy
P 5	D06	SATA	3.0G	OK	End	----	Rdy
P 6	D07	SATA	3.0G	OK	End	----	Rdy
P 7	D08	SATA	3.0G	OK	End	----	Rdy
P 8	D09	SATA	3.0G	OK	End	----	Rdy
P 9	D10	SATA	3.0G	OK	End	----	Rdy
P10	D11	SATA	3.0G	OK	End	----	Rdy
P11	D12	SATA	3.0G	OK	End	----	Rdy
P12	D13	SATA	3.0G	OK	End	----	Rdy
P13	D14	SATA	3.0G	OK	End	----	Rdy
P14	D15	SATA	3.0G	OK	End	----	Rdy
P15	D16	SATA	3.0G	OK	End	----	Rdy
P16	CN1	SAS	3.0G	OK	Exp	----	Rdy
P17	CN1	SAS	3.0G	OK	Exp	----	Rdy
P18	CN1	SAS	3.0G	OK	Exp	----	Rdy
P19	CN1	SAS	3.0G	OK	Exp	----	Rdy
P20	CN2	SAS	3.0G	OK	Exp	----	Rdy
P21	CN2	SAS	3.0G	OK	Exp	----	Rdy
P22	CN2	SAS	3.0G	OK	Exp	----	Rdy
P23	CN2	SAS	3.0G	OK	Exp	----	Rdy

Port:Port Id	Type:SAS or SATA	Rate:Rate 1.5G/3G
Init:Init Passed	Dev :Device Type	Link:Link Connected
PRdy:Phy Ready

Link Counter:
	InDW	DsEr	DwLo	PhRe	CoVi	PhCh
P 0	----------	----------	----------	----------	----------	0x0B
P 1	----------	----------	----------	----------	----------	0x0B
P 2	----------	----------	----------	----------	----------	0x0B
P 3	----------	----------	----------	----------	----------	0x0B
P 4	----------	----------	----------	----------	----------	0x0B
P 5	----------	----------	----------	----------	----------	0x0B
P 6	----------	----------	----------	----------	----------	0x0B
P 7	----------	----------	----------	----------	----------	0x0B
P 8	----------	----------	----------	----------	----------	0x0B
P 9	----------	----------	----------	----------	----------	0x0B
P10	----------	----------	----------	----------	----------	0x0B
P11	----------	----------	----------	----------	----------	0x0B
P12	----------	----------	----------	----------	----------	0x0B
P13	----------	----------	----------	----------	----------	0x0B
P14	----------	----------	----------	----------	----------	0x0B
P15	----------	----------	----------	----------	----------	0x0B
P16	----------	----------	----------	----------	----------	0x01
P17	----------	----------	----------	----------	----------	0x01
P18	----------	----------	----------	----------	----------	0x01
P19	----------	----------	----------	----------	----------	0x01
P20	----------	----------	----------	----------	----------	0x01
P21	----------	----------	----------	----------	----------	0x01
P22	----------	----------	----------	----------	----------	0x01
P23	----------	----------	----------	----------	----------	0x01

InDW:Invalid Dword Count	DsEr:Disparity Err Count
DwLo:Dword Sync Loss Count	PhRe:Phy Reset Problem Count
CoVi:Code Violations Cnt	PhCh:Phy Change Count

Step 2: Interpreting Link Errors

If your system has no link errors, skip to “Step 4: Transporting a Disk Array ”

Link errors may be observed on P0 through P15. This is not the main area of interest but you may want to take corrective action. The link counter may increment when the following change counts occur:

(InDW) Invalid Dword Count
(DsEr) Disparity Err Count
(DwLo) Dword Sync Loss Count
(PhRe) Phy Reset Problem Count
(CoVi) Code Violations Count
(PhCh) Phy Change Count

These errors can be isolated cases when a physical drive times out or resets, encounters read/write errors, or you have a bad AMMUX adapter.

Clear the link error to see if the link counter increments its hexadecimal value.
At the command line, type link –a clear and press Enter.
Then type link and press Enter.

This action might also require a rebuild of the disk array to which the physical drive belongs.

Focusing on Critical Links

The main area of interest is the link counters for P16 through P23. Errors here can affect the Transport operation or may cause the controller RAID Head IO modules to break a path and cause a controller to enter Maintenance Mode.

The links errors may increment when you issue the link command. These ports are connectors physically on the JBOD IO module that are labeled CN1 and CN2.

See page 23, Figure 17 for connector assignments and page 37 for additional information on the link command output in the VTrak J-Class Product Manual: http://www.promise.com/upload/Support/Manual/VTrak_J610s_J310s_PM_v1.0a.pdf.

If link errors are detected:

Clear the link error.
At the command line, type link –a clear and press Enter.
Check to see if the link error comes back.
At the command line, type link and press Enter.
If errors return, identify the source of the link error.
- CN1 = P16 through P19
- CN2 = P20 through P23

Step 3: Correcting Link Errors

After you have identified the source of the link errors you must Fail Over the affected SAS domain before you can take corrective action.

Pull the RAID controller for the affected SAS domain from the enclosure.
See diagram below:

When the RAID controller has been removed from the enclosure, all IOs will resume on the remaining RAID controller SAS domain. Controller Fail Over is almost instantaneous.
Verify the controller Fail Over via the remaining RAID controller.
Using Telnet or HyperTerminal, at the command line, type ctrl and press Enter.
In the example below, note that controller 2 is no longer present.

administrator@cli> ctrl
===================================================
CId	Alias	OpStatus	Readiness Status
===================================================
1		OK	Active
2	N/A	Not Present	N/A

Check the RAID controller CLI event logs to verify that there are no other problems.
At the command line, type event –l nvram and press Enter.
Then type event –l and press Enter.
Find and correct the root cause of the link error.
A link error can be caused by:
- Faulty SAS cable – Replace a suspect cable with a known-good cable.
- Debris blocking the SAS cable connector – Visually inspect and clean.
- Bad IO module CN1 or CN2 connector – Checked online after other possibilities are eliminated. At the command line, type sasdiag -a errorlog –l c2cport and press Enter. Look for incrementing errors.
When you have corrected the root cause of the link errors on P16 through P23 on the respective IO modules verify all SAS cables are properly connected.
Insert the RAID controller back into the enclosure and restore SAS connection connections to the Host.
When the RAID controller is replaced and all paths restored, the RAID Head will Fail Back and return to Active/Active mode. This action can take up to one minute from the moment all the SAS connections are restored and the RAID controller is inserted.

To verify that the RAID Head is in Active/Active mode, do one of the following actions at the command line:

Type ctrl –v and press Enter.
Type event –l nvram and press Enter.
Type event –l and press Enter.

When the RAID Head is back to normal, repeat “Connecting to the RAID Head ” on page 2 to verify that the system is free of link errors.

If link errors are reported, repeat the procedure beginning with “Connecting to JBOD IO Modules ” on page 2 until you have eliminated all link errors on CN1 = P16 through P19 and CN2 - P20 through P23.
If no link errors are reported, proceed to "Transporting a Disk Array"

Step 4: Transporting a Disk Array

This step is the actual operation of transporting the physical drives of a disk array from one location to another.

Connect to the Target RAID Head controller via Telnet or HyperTerminal.
Place the disk array into Transport mode.
At the command line, type array –a transport –d 0 and press Enter.
For proper syntax, type ? array and press Enter or see the CLI User Manual.
Move four physical drives at a time from the Source enclosure to the Target enclosure.
Keep the physical drives in exactly the same order and sequence.

Verify that all controllers have discovered each of the transported physical drives.
At the command line, type phydrv –v –pX and press Enter.
Where X is the physical drive’s number.

See example highlighted below in Bold:

Administrator@cli> phydrv -v -pl
-----------------------------------------------------------------PdId: 1
OperationalStatus: OK
Alias:
PhysicalCapacity: 153.39GB	ConfigurableCapacity: 152.74GB
UsedCapacity: 83.33GB	BlockSize: 512Bytes
ConfigStatus: Array0 SeqNo0	Location: Encl1 Slot1
ModelNo: ATA Hitachi HDS72161	VisibleTo: All Controllers
SerialNo: PVB300Z2R23RBD	FirmwareVersion: P22OA70A
DriveInterface: SATA 3Gb/s	Protocol: ATA/ATAPI-7
WriteCacheSupport: Yes	WriteCache: Enabled
RLACacheSupport: Yes	RLACache: Enabled
SMARTFeatureSetSupport: Yes
SMARTSelfTestSetSupport: Yes
SMARTErrorLoggingSupport: Yes
CmdQueuingSupport: NCQ	CmdQueuing: Enabled
CmdQueueDepth: 16	MultiDMASupport: MDMA2
UltraDMASupport: UDMA5	DMAMode: UDMA5
Errors: 0	NonRWErrors: 0
ReadErrors: 0	WriteErrors: 0
DriveTemperature: N/A	ReferenceDriveTemperature: N/A

After all physical drives in the disk array have been moved to their new locations, verify that the logical drive’s Operational Status is OK.
At the command line, type logdrv –v and press Enter.

Troubleshooting

If a physical drive is NOT visible to all controllers when you run the phydrv –v –pX command, or not present when you run the array –v or logdrv –v commands, the logical drive’s Operational Status will be Offline.

Take the following actions:

Remove from the enclosure the physical drive that is not visible to all controllers.
Replace the AAMUX adapter.
Reinsert the physical drive back into its slot.
Wait one minute for the RAID controllers to detect the physical drive.
At the command line, type phydrv –v –pX and press Enter.
- If the physical drive is “Visible to All Controllers,” verify that the logical drive’s Operational Status is OK. This completes the procedure.
- If the physical drive is NOT reported as “Visible to All Controllers,” contact your Promise FAE for assistance.

No part of this document may be reproduced or transmitted in any form without the expressed, written permission of Promise Technology, Inc.