There are certain conditions that may be encountered by a physical drive that can cause one of the RAID controllers of a dual-controller subsystem to enter “maintenance mode”.
- There is a signal path from both RAID controllers to each physical drive in the enclosure.
- SATA drives only have one ‘port’, so an AAMUX is required to provide a port to the drive for each controller.
- SAS drives have two ports, so no AAMUX is required to use them in the enclosure.
- Both RAID controllers must have access to each drive for full redundancy.
For this example we will use a Disk Array with four drives striped in a RAID 5 Logical Drive, and drive 3 has failed and dropped off-line; the operational state of the Logical Drive is now Critical, where a second drive failure will cause the Logical Drive to go off-line.
If a second drive encounters serious hardware or media errors before the rebuild of the failed drive has been completed, the RAID controller will continuously attempt to get that drive to respond, as a second drive failure can’t be tolerated by the RAID 5.
However, if the failing drive is so busy handling internal errors that it does not respond to one of the RAID controllers, the path between them will be broken or disconnected, and that controller will go off-line in to maintenance mode. This occurs to prevent the operational state of the Logical Drive from changing to Failed due to another drive failure, protecting the data it contains.
This scenario also applies to a RAID 6 Logical Drive that has two failed physical drives.
Note: There is a very small chance that the disconnected state could be the result of a bad path due to a failure of the physical drive’s AAMUX module.
Controller fail-over to maintenance mode will not occur if a physical drive is disconnected from either RAID controller if the Logical Drive it is a member of has an operational status of OK, or Degraded for RAID 6 LDs. The drive’s operational status will be changed to Dead and it will be dropped from the array.
Controller fail-over to maintenance mode will occur if a drive is disconnected from either RAID controller if the Logical Drive it is a member of has an operational status of Critical.
Event Log Entry
Here is an example of an Event Log entry for a controller fail-over due to this condition:
SeqNo: 1 Device: Ctrl 1
EventId: 0x00040032 Severity: Critical
TimeStamp: Mth Day, Year Time DefaultId: 0x00040032
Description: Partner Controller has entered maintenance mode to protect user data since one of the configured physical drives was disconnected in the partner controller.