Technical Update for TruCluster Server Version 5.1B and Higher
March 2005

© 2005 Hewlett-Packard Company

This online supplement contains information about restrictions and problems that have been discovered since the HP TruCluster Server Version 5.1B software began shipping to customers.

For supplemental notes about the operating system and layered products, see the Technical Updates for the Version 5.1B and Higher Operating System and Patches at the following URL:

http://www.tru64unix.compaq.com/docs/pub_page/os_update.html

Issues and Problems

The following sections describe issues and known problems with TruCluster Server Version 5.1B and higher that were uncovered after the publication of the Cluster Release Notes.

Sections are ordered by date, with the most recent entries first.

March 2, 2005: Login Failure Possible with Rolling Upgrade and C2 Security Enabled

Login failures may occur as a result of a rolling upgrade of on systems with Enhanced Security (C2) enabled. The failures may be exhibited in two ways:

The problem may occur after the initial reboot of the lead cluster member or after the rolling upgrade is completed and the clu_upgrade switch procedure has been run. The following sections describe the steps you can take to prevent the problem or correct it after it occurs.

Preventing the problem

You can prevent this problem by performing the following steps before beginning the rolling upgrade:

  1. Disable the prpasswdd daemon from running on the cluster:

    # rcmgr -c set PRPASSWDD_ARGS \
    "`rcmgr get PRPASSWDD_ARGS` -disable"
    

  2. Stop the prpasswdd daemon on every node in the cluster:

    # /sbin/init.d/prpasswd stop
    

  3. Perform the rolling upgrade procedure through the clu_upgrade switch step and reboot all the cluster members.

  4. Perform one of the following actions:

  5. Check that PRPASSWDD_ARGS is now set to what you expect:

    # rcmgr get PRPASSWDD_ARGS
    

  6. Start the prpasswdd daemon on every node in the cluster:

    # /sbin/init.d/prpasswd start
    

  7. Complete the rolling upgrade.

Correcting the problem

If you have already encountered the problem, perform the following steps to clear it:

  1. Restart the prpasswdd daemon on every node in the cluster:

    # /sbin/init.d/prpasswd restart
    

  2. Reboot the lead cluster member.

  3. Check to see if the problem has been resolved. If it has been resolved, you are finished. If you still see the problem, continue to step 4.

  4. Try to force a change to the auth database by performing the following steps:

    1. Use edauth to add a harmless field to an account, the exact commands depend on your editor. For example, pick an account that does not have a vacation set and add u_vacation_end:

      # edauth
      s/:u_lock@:/u_vacation_end#0:u_lock@:/
      w
      q
      

    2. Check to see that the u_vacation_end#0 field was added to the account:

      # edauth -g
      

    3. Use edauth to remove the u_vacation_end#0 field from the account.

    If the edauth commands fail, do not stop. Continue with the following instructions.

  5. Check to see if the problem has been resolved. If it has been resolved, you are finished.

    If you still see the problem, observe the following warning and continue to step 6.

    Warning

    Continue with the following steps only if the following conditions are met:

    • You encountered the described problem while doing a rolling upgrade of a cluster running Enhanced Security.

    • You performed all previous steps.

    • All user authentications (logins) still fail.

  6. Disable logins on the cluster by creating the file /etc/nologin:

    # touch /etc/nologin
    

  7. Disable the prpasswdd daemon from running on the cluster:

    # rcmgr -c set PRPASSWDD_ARGS \
    "`rcmgr get PRPASSWDD_ARGS` -disable"
    

  8. Stop the prpasswdd daemon on every node in the cluster:

    # /sbin/init.d/prpasswd stop
    

  9. Force a checkpoint of db_checkpoint, using the db_checkpoint command with the -1 (number 1) option :

    # /usr/tcb/bin/db_checkpoint -1 -h /var/tcb/files
    

    Continue with the instructions even if this command fails.

  10. Delete the files in the dblogs directory:

    # rm -f /var/tcb/files/dblogs/*
    

  11. Force a change to the auth database, as follows:

    Warning

    If the edauth command fails, do not proceed further. Contact HP support.

  12. If the edauth command was successful, perform one of the following actions:

  13. Check that PRPASSWDD_ARGS is now set to what you expect:

    # rcmgr get PRPASSWDD_ARGS
    

  14. Start the prpasswdd daemon on every node in the cluster:

    # /sbin/init.d/prpasswd start
    

  15. Re-enable logins on the cluster by deleting the file /etc/nologin:

    # rm /etc/nologin
    

  16. Check to see if the problem has been resolved. If it has not, contact HP support.

Feb. 22, 2005: Shared SCSI Support on AlphaServer DS15 Systems

Tru64 UNIX supports a shared SCSI bus using the DS15 embedded external SCSI port and the following Tri-link cabling kit:

Part Number Description
3X-H8411-BA Shared SCSI Adapter Kit for AlphaServer DS15 (Tri-link)
3X-BN56A-03 3 meter terminator cable
3X-BN56A-04 4 meter terminator cable

This adapter kit and cables replaces the y-cable based solution previously supported on AlphaServer DS15 systems. The following sections describe the configuation requirements for the shared SCSI bus.

Hardware Configuration

When operating on a shared bus, jumper J41 must be installed on the system board. This disables the on-board SCSI terminators for that SCSI bus.

When not operating on a shared bus, J41 must be removed and all SCSI BIOS parameters must be reset to their default values otherwise unpredictable system behavior may result.

See section 4.10.1 of the DS15 Owner's Guide for more information: http://h18002.www1.hp.com/alphaserver/download/ek-ds150-og-a01-web.pdf

The following figure shows the cable configuration for the Tri-link cables.

Tri-link Cable Configuration

Configuring the Shared SCSI Bus

You can use the information in the Cluster Hardware Configuration Technical Update for the 3X-KZPEA-DB Ultra3 SCSI PCI Host Bus Adapter (http://h30097.www3.hp.com/docs/updates/kzpea/TITLE.HTM) to configure the shared bus. Note that there are minor differences when configuring the Alphaserver DS15:

August 7, 2003: Modifications to System Files Appendix

The "Modifications to System Files" appendix in the Cluster Installation manual lists system files that are created or modified as a result of installing the TruCluster server software. The following files were not listed in that appendix:

/etc/hosts.equiv: add  cluster interconnect name
/etc.cfgmgr.auth:   add fully qualified hostname
/etc/fstab (on new cluster file system):  add the cluster file systems when creating the cluster
/etc/clu_alias.config:  add a cluamgr command for the default cluster alias
/etc/securettys:  add a ptys entry if set in the current member (clu_add_member)

April 9, 2003: Problems Performing clu_upgrade from Version 5.1 to Version 5.1A and Patch Kit 3

During the analysis phase of a rolling upgrade from Version 5.1 to Version 5.1A and Patch Kit 3, the system may dislay multiple error messages:

*** Determining installed Tru64 UNIX Worldwide Language Support V5.1 (rev. 89) software ***
 
        Working....Thu Jun 21 16:54:52 EDT 2001
depord: warning, no .ctrl file for "TCRPAT00010000520"
depord: warning, no .ctrl file for "OSFPAT00010000520"
*** Checking for obsolete files ***
.
.
.
.
depord: warning, no .ctrl file for "OSFPAT00010000520"
depord: warning, no .ctrl file for "OSFPAT00000030520"
 
*** Checking file system space ***
 
depord: warning, no .ctrl file for "TCRPAT00010000520"
depord: warning, no .ctrl file for "OSFPAT00010000520"
.
.
.
.
cat: cannot open /instctrl/OSFP
AT00000030520.inv
cat: cannot open /var/cluster/members/{memb}/adm/update/tmpstaydir/instctrl/OSFP
AT00010000520.inv
cat: cannot open /var/cluster/members/{memb}/adm/update/tmpstaydir/instctrl/TCRP
AT00010000520.inv
Update Installation is now ready to begin modifying the files necessary
to reboot the cluster member off of the new OS. Please check the
/var/adm/smlogs/update.log and /var/adm/smlogs/it.log files for errors
after the installation is complete.

These error messages can be ignored, the system will upgrade succesfully. However, the messages may prevent you from seeing the question that asks if you want to continue the upgrade. The default answer to this question is no. You must answer yes after the error messages are displayed to continue the upgrade.

February 19, 2003: Corrected Information Regarding CAA User-Defined Attributes

The current documentation for CAA user-defined attributes in the Cluster Highly Available Applications book and the CAA reference pages is not clear that all user-defined attribute names must begin with the four characters USR_, for example, USR_DEBUG.

The documentation incorrectly states that you can access an attribute as an environment variable with the same name as the attribute. You must actually use the attribute name preceded by an underscore for the environment variable name within an action script. An example of how to use this inside an action script for an attribute named USR_DEBUG would be:

echo $_USR_DEBUG >> debug.log

However, specifying a value for a user-defined attribute on the command line of a caa_start, caa_stop, or caa_relocate command requires that you NOT include a leading underscore. An example for an attribute named USR_DEBUG would be:

caa_start USR_DEBUG=1 resource_name

Once you add a user-defined attribute to the application.tdf file, any existing application that needs to use that attribute can include a new value for that attribute in its profile. Updating the registration (caa_register -u resource_name) is not sufficient to begin using the new user-defined attributes and values. After the application.tdf file and/or profile are modified, any application using the new attributes must be unregistered and registered again:

caa_unregister resource_namecaa_register resource_name

December 5, 2002: Output of caa_report Command Can Be Incorrect

In rare circumstances, the data that caa_report uses to create its output can become corrupted resulting in the following symptoms:

There is no workaround.

December 5, 2002: The HTML Version of Online Help for CAA and Cluster Alias is not Installed Correctly

The HTML version of the online help for Cluster Application Availability and Cluster Alias is not correctly untarred when the TruCluster package is installed. Therefore, online help for these packages will not work when called from SysMan Station, SysMan Menu on a PC, or a Web Browser. Online help functions correctly with the X-windows version of SysMan Menu and the character cell terminal.

To fix this problem, you must untar the HTML packages as follows (note that the target of the cd command is language-specific; the example shows the directory for en_US.ISO8859-1):

cd /usr/share/sysman/web/suitlet_help/html/en_US.ISO8859-1/
 
# gzip -dc help_caaman.tar.gz | tar xvf -
 
# gzip -dc help_clua.tar.gz | tar xvf -

December 5, 2002: Behavior of Disconnected Members in a Cluster with LAN Cluster Interconnect

Section 2.2.2, Memory Channel Restrictions, of the Cluster Hardware Configuration man describes the behavior of a cluster member that loses connectivity to the Memory Channel cluster interconnect. The following paragraphs alternate published information from that section (marked [MC]) with additional information (marked [LAN]) for clusters that use a LAN cluster interconnect.

[MC] If you configure a cluster with a single rail Memory Channel in standard hub mode and the hub fails, or is powered off, every cluster member panics. They panic because no member can see any of the other cluster members over the Memory Channel interface. A quorum disk does not help in this case, because no system is given the opportunity to obtain ownership of the quorum disk and survive.

[LAN] In a similar LAN cluster, disconnected members do not panic. You need to explicitly reconnect and reboot them in order for them to rejoin the cluster. (Essentially, in both MC hub and LAN clusters, disconnected members must go down (panic or reboot) in order to rejoin the cluster.)

[MC] To prevent this situation in standard hub mode (two member systems connected without a Memory Channel hub), install a second Memory Channel rail. A hub failure on one rail will cause failover to the other rail.

[LAN] Two-node LAN clusters without a quorum disk behave the same way as two-node virtual hub clusters without a quorum disk, except that reconnecting the disconnected node will not (as in an MC cluster) cause the cluster to reform. You must reboot the disconnected node in a LAN cluster for this to happen. Two-node clusters with a quorum disk behave in the same way with either interconnect.

October 3, 2002: Error in the Cluster Highly Available Applications Manual

Section 4.3 in the Cluster Highly Available Applications manual incorrectly lists signals as not supported for clusterwide interprocess communication. Clusterwide signals are supported in Version 5.1B.

October 2, 2002: Benign WLS Error Messages During a Rolling Upgrade

During a cluster rolling upgrade installation, after loading the Worldwide Language Support (WLS) subsets, the console might display missing status file messages similar to the following:

 289 of 289 subsets installed successfully.
 
 *** Starting protofile merges for Tru64 UNIX Worldwide Language \
     Support V5.1B (ev. 231) ***
 
 *** Finished protofile merges for Tru64 UNIX Worldwide Language \
     Support V5.1B (ev. 231)  ***
 
  find: /usr/cluster/members/member1/.smdb./IOSWWBASE520.sts : No \
        such file or directory
  find: /usr/cluster/members/member1/.smdb./IOSZHBASE520.sts : No \
        such file or directory
  find: /usr/cluster/members/member0/.smdb./IOSWWBASE520.sts : No \
        such file or directory
 
 *** Starting configuration merges for Update Install ***

You can ignore these messages, they do not affect the install stage.

September 26, 2002: Outdated Illustration in the HTML Version of the Cluster Hardware Configuration Manual

The HTML version of the Version 5.1B TruCluster Server Cluster Cluster Hardware Configuration manual contains an outdated illustration. Figure 9-1, TruCluster Server Cluster with a TL891 on Two Shared Buses, shows a RAID Array 7000 with HSZ70 controllers. The HSZ70 controllers are not supported with Version 5.1B. This illustration was updated to a RAID Array 8000 correctly in the PDF version of the manual, but did not get updated in the HTML version of the manual.

September 26, 2002: cluamgr (8) Description of Alias Behavior When All Members Set rpri=0 Is Incorrect

When describing the behavior of the cluster alias rpri attribute, cluamgr(8) states that even if all members of an alias set rpri=0, a member of that alias will be elected as the proxy ARP master. In reality, if all members of an alias set rpri=0, no proxy ARP master is elected and clients will not be able to access the cluster via that alias.

Comments and Questions

We value your comments and questions on the information in this document. Please mail your comments to us at this address:

readers_comment@zk3.dec.com

Legal Notice

UNIX® and The Open GroupTM are trademarks of The Open Group in the U.S. and/or other countries.

All other product names mentioned herein may be trademarks of their respective owners.

Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license.

The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.