Kit Name: ALPMC01_071 Kits superseded by this kit: None Kit Description: Version(s) of OpenVMS to which this kit may be applied: OpenVMS Alpha V7.1 In order to receive the full fixes listed in this kit the following remedial kits also need to be installed: None Files patched or replaced: o [SYSLDR]SYS$MCDRIVER_NEW.EXE (new image) o [SYSLDR]SYS$MCDRIVER.EXE (new image) o [SYSMSG]MCMSG.EXE (new image) o [SYS$LDR]SYS$PMDRIVER.EXE (new image) Problems addressed in ALPMC01_071 kit o An OpenVMS Cluster node, connected by the MEMORY CHANNEL cluster interconnect, might be set permanently offline, stall, or crash when the MEMORY CHANNEL cluster experienced an abnormally high error rate during reinitialization of the MEMORY CHANNEL nodes. In an OpenVMS Cluster system that uses the MEMORY CHANNEL cluster interconnect, each MEMORY CHANNEL port reinitializes the other MEMORY CHANNEL ports whenever one of the following events occurs: o Hardware error specific to MEMORY CHANNEL o Error on a device served over the MEMORY CHANNEL o SCS connection over the MEMORY CHANNEL is lost Page 2 o Virtual circuit over the MEMORY CHANNEL is lost During this reinitialization, an abnormally high error rate or other type of high fault insertion rate can cause a MEMORY CHANNEL port to lose synchronization with the other MEMORY CHANNEL ports. When a MEMORY CHANNEL port is no longer synchronized with the other MEMORY CHANNEL ports, one of the following events may happen: o The unsynchronized MEMORY CHANNEL port is set permanently offline. o A node, which is waiting for data structure updates from the node with the unsynchronized MEMORY CHANNEL port, stalls. The data structure updates can't be delivered because the sending node's MEMORY CHANNEL port is no longer synchronized with the other MEMORY CHANNEL ports. o The node with the unsynchronized MEMORY CHANNEL port can crash due to corrupted MEMORY CHANNEL data structures. An additional sympathetic crash can also occur. A node that is stalled and a node whose MEMORY CHANNEL port is set permanently offline cannot rejoin the MEMORY CHANNEL cluster without a reboot. If a second OpenVMS Cluster interconnect is present, the node can failover to that interconnect and continue to participate in an OpenVMS Cluster system. However, it cannot use the MEMORY CHANNEL cluster interconnect until it is rebooted. The drivers in this kit prevent the problems associated with an abnormally high error rate during reinitialization of MEMORY CHANNEL nodes from occurring. o Some MEMORY CHANNEL messages were difficult to read because the formatting was flawed The formatting problems have been fixed. In addition, timestamps have been added to all error messages. Page 3 Kit Installation Rating: The following kit installation rating, based upon current CLD information, is provided to serve as a guide as to which customers should apply this remedial kit. (Reference attached Disclaimer of Warranty and Limitation of Liability Statement) INSTALLATION RATING: 2 : To be installed by all customers using the following feature(s): To be installed by all customers with Memory Channel VMSclusters experiencing periodically high error rates, high connection failure rate, or high Virtual Circuit closure rate, or those customers that have greater than 4 nodes Memory Channel VMSclusters. Note that while these new Memory Channel drivers can withstand higher error rates and connection and Virtual Circuit failures, the cause of this high failure rate should be addressed and fixed. The fact that these new drivers recover from this condition should not be used as a solution because it is really only masking the real problem. Installation Instructions: _______________________ Caution _______________________ Do not install these new drivers on a node in a MEMORY CHANNEL cluster while other nodes are running the original OpenVMS Version 7.1 MEMORY CHANNEL drivers. Attempting to use these new drivers in a cluster with the original drivers can stall the cluster and can also cause disk corruption. ______________________________________________________ These new drivers are not compatible with the original OpenVMS Version 7.1 MEMORY CHANNEL drivers. This kit does not support a rolling upgrade if MEMORY CHANNEL is the only cluster interconnect. SINGLE CLUSTER INTERCONNECT: If Memory Channel is the only cluster interconnect, OpenVMS Engineering recommends the following procedure. 1) Shut down the entire cluster. 2) Reboot one node. Install the kit on that node. Doing this will insure that SYS$SPECIFIC: does not contain any older versions of the Memory Channel driver. Rename SYS$COMMON:[SYS$LDR]SYS$MCDRIVER.EXE to SYS$COMMON:[SYS$LDR]SYS$MCDRIVER.EXE_MESSAGE_ONLY. Rename SYS$COMMON:[SYS$LDR]SYS$MCDRIVER_NEW.EXE to SYS$COMMON:[SYS$LDR]SYS$MCDRIVER.EXE;. Page 4 3) Shut down this node. 4) Repeat steps 2 through 3 for all remaining Memory Channel capable nodes of the cluster. Doing this will insure that SYS$SPECIFIC: does not contain any older versions of the Memory Channel driver. 5) The cluster can now be rebooted. MULTIPLE CLUSTER INTERCONNECT: If each node in the VMScluster has another interconnect, to support cluster traffic, the following procedure is recommended. To prevent any interaction between the V7.1 SSB Memory Channel driver and the driver in this kit, each node will require two reboots. NOTE: Step 6 and 7 must be done on every Memory Channel capable node before moving onto step 8. 6) Install this kit on one of the Memory Channel capable nodes in the VMScluster. Doing this will insure that SYS$SPECIFIC: does not contain any older versions of the Memory Channel driver. 7) Reboot this node. The use of the Memory Channel will be disabled on this node, and a warning message will be displayed, instructing that these INSTALLATION INSTRUCTIONS should be read. The use, of this interium driver, will cause cluster communication to fail over to some other available cluster interconnect. NOTE: Step 6 and 7 must be done on every Memory Channel capable node before moving onto step 8. 8) On one node: Rename SYS$COMMON:[SYS$LDR]SYS$MCDRIVER.EXE to SYS$COMMON:[SYS$LDR]SYS$MCDRIVER.EXE_MESSAGE_ONLY. Rename SYS$COMMON:[SYS$LDR]SYS$MCDRIVER_NEW.EXE to SYS$COMMON:[SYS$LDR]SYS$MCDRIVER.EXE. 9) REBOOT this node. 10) Repeat steps 8 through 9 for all remaining nodes in the cluster. Install this kit with the VMSINSTAL utility by logging into the SYSTEM account, and typing the following at the DCL prompt: Page 5 @SYS$UPDATE:VMSINSTAL ALPMC01_071 [location of the saveset] The saveset location may be a tape drive, or a disk directory that contains the kit saveset. After you have completed these steps, all nodes will be back up and running cluster communication traffic over the MEMORY CHANNEL. **NOTE** Customers will need to install ALPCPU101_071 in order to provide Memory Channel support for the following systems: - AS1000a-5/266 - AS800-5/333 - AS800 5/400 - AS1000 5/266 System should be rebooted after successful installation of the kit. If you have other nodes in your VMScluster, they should also be rebooted in order to make use of the new image(s). Copyright (c) Digital Equipment Corporation, 1997 All Rights Reserved. Unpublished rights reserved under the copyright laws of the United States. The software contained on this media is proprietary to and embodies the confidential technology of Digital Equipment Corporation. Possession, use, or dissemination of the software and media is authorized only pursuant to a valid written license from Digital Equipment Corporation. DISCLAIMER OF WARRANTY AND LIMITATION OF LIABILITY THIS PATCH IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY EXCLUDED TO THE EXTENT PERMITTED BY APPLICABLE LAW. IN NO EVENT WILL DIGITAL BE LIABLE FOR ANY LOST REVENUE OR PROFIT, OR FOR SPECIAL, INDIRECT, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, WITH RESPECT TO ANY PATCH MADE AVAILABLE HERE OR TO THE USE OF SUCH PATCH.