The following sections describe some of the key
features and enhancements that were first delivered in previous patch
kits.
Select Option to Check Tagged Files |
 |
During the preinstall stage of a rolling upgrade,
you have the option of checking tagged files. You should override the
default setting and select the check tag option. The reason for
selecting this option is described in “Check for Tagged Files if Messages Are Displayed”.
Check for Tagged Files if Messages Are Displayed |
 |
When installing this patch kit during a rolling
upgrade, you may see the following error and warning messages during
the setup stage:
Creating tagged files.
*** Error ***
The tar commands used to create tagged files in the '/usr' file system have
reported the following errors and warnings:
tar: lib/nls/msg/en_US.88591/ladebug.cat : No such file or directory
*** Warning ***
The above errors were detected during the cluster upgrade. If you believe that
the errors are not critical to system operation, you can choose to continue.
If you are unsure, you should check the cluster upgrade log and refer
to clu_upgrade(8) before continuing with the upgrade. |
If
you see these messages during the setup stage, you should verify that
the tagged files were properly created when you execute the preinstall
stage.
In cases where the tagged files are not created,
you can repeat the setup stage.
Noncritical Errors |
 |
During a rolling upgrade to install this patch kit
, you may encounter the following noncritical situations:
The tagged file for ifaccess.conf
(.Old..ifaccess.conf) may disappear. This
error will not cause any problems with the rolling upgrade
procedure or the installation of the kit. A message would alert
you to this condition if you use the clu_upgrade
undo command. Running the clu_upgrade -v
check setup at the start of the procedure will fix
this error.
When the worldwide language subset is installed, the file
wwinstall will attempt to be tagged and will
fail. This error will not affect the operational status of the
cluster.
Unrecoverable Failure Procedure |
 |
The procedure to follow if you encounter
unrecoverable failures while running dupatch during
a rolling upgrade has changed. The new procedure calls for you to run
the clu_upgrade -undo install command and then set
the system baseline. The procedure is explained in the
Patch
Kit Installation Instructions as notes in Section
5.3 and Section 5.6.
Do Not Add or Delete OSF, TCR, IOS, or OSH Subsets During
Roll |
 |
During a rolling upgrade, do not use the
/usr/sbin/setld command to add or delete any of the
following subsets:
Base Operating System subsets (those with the
prefix OSF).
TruCluster Server subsets (those with the
prefix TCR).
Worldwide Language Support (WLS) subsets
(those with the prefix IOS).
New Hardware Delivery (NHD) subsets (those
with the prefix OSH).
Adding or deleting these subsets during a roll
creates inconsistencies in the tagged files.
Undo Stages in Correct Order |
 |
If you need to undo the install stage, because the
lead member is in an unrecoverable state, be sure to undo the stages
in the correct order.
During the install stage,
clu_upgrade cannot tell whether the roll is going
forward or backward. This ambiguity incorrectly allows the
clu_upgrade undo preinstall stage to be run before
clu_upgrade undo install. Refer to the
Patch
Kit Installation Instructions for additional
information on undoing a rolling patch.
clu_upgrade undo of Install Stage Can Result in Incorrect File
Permissions |
 |
This note applies only when both of the following
are true:
You are using installupdate,
dupatch, or nhd_install to
perform a rolling upgrade.
You need to undo the install stage;
that is, to use the clu_upgrade undo install
command.
In this situation, incorrect file permissions can
be set for files on the lead member. This can result in the failure of
rsh, rlogin, and other commands
that assume user IDs or identities by means of
setuid.
The clu_upgrade undo install
command must be run from a nonlead member that has access to the lead
member's boot disk. After the command completes, follow these steps:
Boot the lead member to single-user mode.
Run the following script:
#!/usr/bin/ksh -p
#
# Script for restoring installed permissions
#
cd /
for i in /usr/.smdb./$(OSF|TCR|IOS|OSH)*.sts
do
grep -q "_INSTALLED" $i 2>/dev/null && /usr/lbin/fverify -y <"${i%.sts}.inv"
done |
Rerun installupdate,
dupatch, or nhd_install,
whichever is appropriate, and complete the rolling
upgrade.
For information about rolling upgrades, see the
Patch
Kit Installation Instructions and the
installupdate(8) and clu_upgrade(8) reference pages.
Missing Entry Messages Can Be Ignored During Rolling
Patch |
 |
During the setup stage of a
rolling patch, you might see a message like the following:
Creating tagged files.
............................................................
clubase: Entry not found in /cluster/admin/tmp/stanza.stdin.597530
clubase: Entry not found in /cluster/admin/tmp/stanza.stdin.597568 |
An Entry not found message will
appear once for each member in the cluster. The number in the message
corresponds to a PID.
You can safely ignore this Entry not
found message.
Relocating AutoFS During a Rolling Upgrade on a Cluster |
 |
This note applies only to performing rolling
upgrades on cluster systems that use AutoFS.
During a cluster rolling upgrade, each cluster
member is singly halted and rebooted several times. The
Patch
Kit Installation Instructions direct you to
manually relocate applications under the control of Cluster
Application Availability (CAA) prior to halting a member on which CAA
applications run.
Depending on the amount of NFS traffic, the manual
relocation of AutoFS may sometimes fail. Failure is most likely to
occur when NFS traffic is heavy. The following procedure avoids that
problem.
At the start of the rolling upgrade procedure, use
the caa_stat command to learn which member is
running AutoFS. For example:
# caa_stat -t
Name Type Target State Host
------------------------------------------------------------
autofs application ONLINE ONLINE rye
cluster_lockd application ONLINE ONLINE rye
clustercron application ONLINE ONLINE swiss
dhcp application ONLINE ONLINE swiss
named application ONLINE ONLINE rye |
To minimize your effort in the following procedure, perform the roll
stage last on the member where AutoFS runs.
When it is time to perform a manual relocation on
a member where AutoFS is running, follow these steps:
Stop AutoFS by entering the following command on the
member where AutoFS runs:
# /usr/sbin/caa_stop -f autofs |
Perform the manual relocation of other applications
running on that member:
# /usr/sbin/caa_relocate -s current_member -c target_member |
After the member that had been running AutoFS has
been halted as part of the rolling upgrade procedure, restart AutoFS
on a member that is still up. (If this is the roll stage and the
halted member is not the last member to be rolled, you can minimize
your effort by restarting AutoFS on the member you plan to roll
last.)
On a member that is up, enter the following command to
restart AutoFS. (The member where AutoFS is to run,
target_member, must be up and running
in multi-user mode.)
# /usr/sbin/caa_startautofs -c target_member |
Continue with the rolling upgrade procedure.
Messages Displayed During Rolling Upgrade Can Be
Ignored |
 |
You can ignore the following messages if you see
them displayed during a rolling upgrade:
kill:1048674: no such process
This message may be displayed after the roll stage. For
example:
# clu_upgrade roll
This is the cluster upgrade program.
⋮The 'roll' stage has completed successfully. This
member must be rebooted in order to run with the newly
installed software.
Do you want to reboot this member at this time? []:y
You indicated that you want to reboot this member at this time.
Is that correct? [yes]:
The 'roll' stage of the upgrade has completed successfully.
kill: 1048674: no such process
# |
rmdir: /var/.clu_upgrade: File
exists
This message may be displayed after the clean stage. For
example:
# clu_upgrade clean
This is the cluster upgrade program.
You have indicated that you want to perform the 'clean' stage
of the upgrade.
Do you want to continue to upgrade the cluster? [yes]:
⋮
Deleting tagged files.
.................................................................
.................................................................
.................................................................
.................................................................
...................................Removing back-up and kit files
rmdir: /var/.clu_upgrade: File exists
The 'clean' stage of the upgrade has completed successfully.
# |
Error on Cluster Creation |
 |
When you attempt to create a cluster after having
deleted patches, you may see the following error messages:
*** Error ***
This system has only Tru64 UNIX patches installed.
Please install the latest TruCluster Server patches on your system.
You can obtain the most recent patch kit from:
http://www.support.compaq.com/patches/
*** Error ***
The system is not configured properly for cluster creation.
Please fix the previously reported problems, and then rerun the
'clu_create' command. |
If you see these messages, enter the following
command:
# ls -tlr /usr/.smdb./*PAT*.sts |
If this command returns a file with
000000 in its name, you will have to run the
clu_create command with the
-f option to force the creation of
your cluster. The problem is caused by the cluster software
misinterpreting the existence of some patches and will be corrected in
a future patch kit.
If the command does not return a file with
000000 in its name, you will need to contact HP
support to determine the cause of the problem.
When Taking a Cluster Member to Single-User Mode, First Halt
the Member |
 |
To take a cluster member from multiuser mode to
single-user mode, first halt the member and then boot it to
single-user mode. For example:
# shutdown -h now
>>> boot -fl s |
Halting and booting the system ensures that it
provides the minimal set of services to the cluster and that the
running cluster has a minimal reliance on the member running in
single-user mode.
When the system reaches single-user mode, enter
the following commands:
# /sbin/init s
# /sbin/bcheckrc
# /usr/sbin/lmf reset |
Login Failure Possible with C2 Security Enabled |
 |
Login failures may occur as a result of a rolling
upgrade on systems with Enhanced Security (C2) enabled. The failures
may be exhibited in two ways:
With the following error message:
Can't rewrite protected password entry for user |
With the following set of error messages:
login: Ignoring log file: /var/tcb/files/dblogs/log.00001: magic number 0, not 8
login: log_get: read: I/O error
Can't rewrite protected password entry for user |
The problem may occur after the initial reboot of
the lead cluster member or after the rolling upgrade is completed and
the clu_upgrade switch procedure has been run. The
following sections describe the steps you can take to prevent the
problem or correct it after it occurs.
You can prevent this problem by performing the
following steps before beginning the rolling upgrade:
Disable the prpasswdd daemon from
running on the cluster:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS` -disable" |
Stop the prpasswdd daemon on every
node in the cluster:
# /sbin/init.d/prpasswd stop |
Perform the rolling upgrade procedure through the
clu_upgrade switch step and reboot all the
cluster members.
Perform one of the following actions:
If PRPASSWDD_ARGS did not exist
before this upgrade (that is, if rcmgr get
PRPASSWDD_ARGS at this point shows only
-disable), then delete
PRPASSWDD_ARGS:
# rcmgr -c delete PRPASSWDD_ARGS |
If PRPASSWDD_ARGS existed
before this upgrade, then reset
PRPASSWDD_ARGS to the original
string:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS | sed 's/ -disable//'`" |
Check that PRPASSWDD_ARGS is now set
to what you expect:
# rcmgr get PRPASSWDD_ARGS |
Start the prpasswdd daemon on every
node in the cluster:
# /sbin/init.d/prpasswd start |
Complete the rolling upgrade.
If you have already encountered the problem,
perform the following steps to clear it:
Restart the prpasswdd daemon on every
node in the cluster:
# /sbin/init.d/prpasswd restart |
Reboot the lead cluster member.
Check to see if the problem has been resolved. If it has
been resolved, you are finished. If you still see the problem,
continue to step 4.
Try to force a change to the auth database by performing
the following steps:
Use edauth to add a harmless
field to an account, the exact commands depend on your
editor. For example, pick an account that does not have
a vacation set and add
u_vacation_end:
# edauth
s/:u_lock@:/u_vacation_end#0:u_lock@:/
w
q |
Check to see that the
u_vacation_end#0 field was added to the
account:
Use edauth to remove the
u_vacation_end#0 field from the
account.
If the edauth commands
fail, do not stop. Continue with the following
instructions.
Check to see if the problem has been resolved. If it has
been resolved, you are finished.
If you still see the problem, observe the following
warning and continue to step 6.
Disable logins on the cluster by creating the file
/etc/nologin:
Disable the prpasswdd daemon from
running on the cluster:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS` -disable" |
Stop the prpasswdd daemon on every
node in the cluster:
# /sbin/init.d/prpasswd stop |
Force a checkpoint of db_checkpoint,
using the db_checkpoint command with the
-1 (number 1) option :
# /usr/tcb/bin/db_checkpoint -1 -h /var/tcb/files |
Continue with the instructions even if this command
fails.
Delete the files in the dblogs
directory:
# rm -f /var/tcb/files/dblogs/* |
Force a change to the auth database, as
follows:
Use the edauth command to add a
harmless field to an account, the exact commands depend
on your editor. For example, pick an account that does
not have a vacation set and enter the following:
# edauth
s/:u_lock@:/u_vacation_end#0:u_lock@:/
w
q |
Check to see that the
u_vacation_end#0 field was added to
the account:
Use the edauth command to
remove the u_vacation_end#0 field
from the account.
If the edauth command was successful,
perform one of the following actions:
If PRPASSWDD_ARGS did not exist
before this upgrade (that is, if rcmgr get
PRPASSWDD_ARGS at this point shows only
-disable), then delete
PRPASSWDD_ARGS:
# rcmgr -c delete PRPASSWDD_ARGS |
If PRPASSWDD_ARGS existed
before this upgrade, then reset
PRPASSWDD_ARGS to the original
string:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS | sed 's/ -disable//'`" |
Check that PRPASSWDD_ARGS is now set
to what you expect:
# rcmgr get PRPASSWDD_ARGS |
Start the prpasswdd daemon on every
node in the cluster:
# /sbin/init.d/prpasswd start |
Re-enable logins on the cluster by deleting the file
/etc/nologin:
Check to see if the problem has been resolved. If it has
not, contact HP support.
File System Unmount Recommended if Message Is Displayed |
 |
Under certain error conditions, the following
message may be seen during a relocation or failover, or during the
boot of a member:
WARNING: Unable to failover /mnt: pfs and cfs fsids differ |
The
result is that the fileset in question is now unserved in the cluster.
For example:
# cfsmgr /mnt
Domain or filesystem name = /mnt
Server Status : Not Served |
If this occurs, we recommend that you immediately
do the following:
Use the following command to unmount the
filesystem:
# cfsmgr -u -p [mountpoint] |
If other mounted filesets exist in the same domain,
unmount them (they should also be in the "Not Served"
state):
For steps on checking an AdvFS domain, see the AdvFS
Administration Guide, Section 6.3.1, steps 3-7.
Run diagnostics on the domain prior to remounting its file
systems.
To verify the domain, you can use the AdvFS
verify utility or the
fixfdmn utility. If using
fixfdmn, we recommend first running it with
the -n option to see what
errors are found prior to allowing fixfdmnn
to fix them.
Once you have successfully verified the domain,
remounting the domain's file systems in the cluster should
succeed.
If the domain cannot be immediately verified, we
recommend that you do not remount the original fileset until this can
be done.
Tunable Attribute May Help Performance Problem |
 |
The tunable attribute
cfs_clone_noccr, included in this patch kit , may
correct a problem in which cluster fileset writes that occur
simultaneously with reads of the fileset clone on a cluster client
(for example, during a backup) may result in performance degradation.
This occurs most often when the clone file being read consists of many
thousands of extents (for example, 20,000 or more).
If a degradation during cluster clone reads is
noticeable (for example, the clone read appears to be hanging and
requires a long time to complete), set the value of
cfs_clone_noccr to 1 on the server of the given
fileset. This sysconfig tunable attribute is set to 0 by default and
should be changed only when the degradation is noticeable.
Note that all filesets with clones that are served
by the node on which the attribute is set will also see this change.
It may be advisable (though not required) to have those filesets whose
clone files have fewer extents be served by a different node during
the time the tunable attribute is set.
AlphaServer ES47 or AlphaServer GS1280 Hangs When Added to
Cluster |
 |
If after running clu_add_member
to add an AlphaServer ES47 or AlphaServer GS1280 as a member of a
TruCluster the AlphaServer hangs during its first boot, try rebooting
it with the original V5.1B generic cluster kernel,
clu_genvmunix.
Use the following instructions to extract and copy
the V5.1B cluster genvmunix from your original
Tru64 UNIX kit to your AlphaServer ES47 or AlphaServer GS1280 system.
In these instructions, the AlphaServer ES47 or AlphaServer GS1280 is
designated as member 5. Substitute the appropriate member number for
your cluster.
Insert the Tru64 UNIX Associated Products Disk 2 into the
CD-ROM drive of an active member.
Mount the CD-ROM to /mnt. For
example:
# mount -r /dev/disk/cdrom0c /mnt |
Mount the boot disk of the AlphaServer ES47 or AlphaServer
GS1280 on its specific mount point; for example:
# mount root5_domain#root /cluster/members/member5/boot_partition |
Extract the original clu_genvmunix from
the CD-ROM and copy it to the boot disk of the AlphaServer ES47
or AlphaServer GS1280 member.
# zcat < TCRBASE540 | ( cd /cluster/admin/tmp; \
tar -xf - ./usr/opt/TruCluster/clu_genvmunix)
# cp /cluster/admin/tmp/usr/opt/TruCluster/clu_genvmunix \
/cluster/members/member?/boot_partition/genvmunix
# rm /cluster/admin/tmp/usr/opt/TruCluster/clu_genvmunix |
Unmount the CD-ROM and the boot disk:
# umount /mnt
# umount /cluster/members/member5/boot_partition |
Reboot the AlphaServer ES47 or AlphaServer GS1280.
Problems with clu_upgrade Switch Stage |
 |
If the clu_upgrade switch stage
does not complete successfully, you may see a message like the
following:
versw: No switch due to inconsistent versions |
The problem can be due to one or more members running
genvmunix, a generic kernel.
Use the command clu_get_info
-full and note each member's version number, as reported in
the line beginning
If a
member has a version number different from that of the other members,
shut down the member and reboot it from vmunix,
the custom kernel. If multiple members have the different version
numbers, reboot them one at a time from
vmunix.
Data Protector Issues and Restrictions |
 |
The following sections describe issues and
restrictions for Version 5.1 of the HP OpenView Storage Data Protector
backup and recovery product when configuring it on a Tru64 UNIX
cluster.
Possible Error Backing Up Cluster Mount Points
When backing up cluster mount points using the
cluster alias as the client name, you may encounter an error in
which the directory is reported as a mount point to a different file
system and is backed up as an empty directory.
To correct this problem, create TruCluster
Server clients as follows:
Create a client for each host name node in the
cluster.
Create another client using the cluster alias name,
selecting it as a virtual host.
You can then create backups using the alias as the
client name.
You may also need to define your mount points to
back up using the manual add function of the Add Backup wizard.
Under some circumstances, backups that are created using the default
device discovery encounter the “backed up as an empty
directory” problem.
Configuring Data Protector for Oracle Integration
When Configuring Data Protector for Oracle
integration, libobk.so should be linked with
/usr/omni/lib/libob2oracle8_64bit.so.
The Data Protector UNIX Integration
Guide incorrectly states that it should be linked with
/usr/omni/lib/libob2oracle8_64.so.
Set ipport_userreserved Attribute on Large Systems |
 |
Larger systems can encounter portmapper problems
in a local area network (LAN) cluster if the value of the
ipport_userreserved attribute has not been tuned.
The recommended value is 65535 and should be the same for all cluster
members. Set the value before adding the first member.
If this value is not set for a LAN cluster with
larger machines, the machines may run out of ports for interconnect
services. For more information, see the manual Tuning Tru64
UNIX for Internet Servers.