|
|
 |
The following sections provide brief descriptions of
the changes delivered in this patch kit and in previous Version 5.1B
patch kits for the TruCluster Server software products. Each patch provides fixes to subsets of the
operating system. Subset names (listed in italic font in the following
list) consist of three parts; for example, for subset
TCRBASE540, the TCR indicates
that the subset is part of the TruCluster Server product, the
BASE indicates a category, and the
540 indicates that the subset belongs to the
Version 5.1B operating system. New Patches |  |
The patch summaries in this section describe
changes to the TruCluster Server software products that are new in
this release. PATCH 28001.00TCRBASE540 Fixes a problem in which a CFS client read operation returns
the wrong data due to stale metadata associated with the file
frag. Adds a check to prevent the caller from binding to a cluster
alias address that the node has not joined. Fixes an infinite loop under certain circumstances in
cms_do_mount_rpc(). Added option to unset all flags for a service in
/etc/clua_services. Corrects a reference count issue in the KGS
subsystem. Fixes node panic with
ics_unable_to_make_progress: netisrs
stalled, though netisr thread
was not actually stalled. Provides a fix for a domain panic caused by hung
IOs on a busy or faulty disk drive. The panic
can happen after all but one path to the disk drive being disabled
then re-enabled. Corrects a problem where a 'local open' on a previously
opened tape drive results in an erroneous "no such
device" message. Provides a fix for a cluster boot-time hang, caused by a
fault quorum disk. Fixes multiple issues with RDG(Reliable DataGram) component
in a LAN cluster. Fixes an issue with CFS failover subsystem where, under
certain domain configurations failover process may hang. Fixes a problem with fuser(8) where
usage of the -a option leaves the filesystem
incapable of unmounting even if no files or directories on the
filesystem are in use. Fixes a problem where, under certain circumstances, a close
on socket of type AF_UNIX may result in a
system panic. Provides enhancements to the DRD trace framework. Optimize the performance of ics0
interface in a LAN cluster. Fixes an issue with aliasd routing in a cluster. Avoids panic due to bad quorum disk during boot
process. Fixes an issue wherein Internode Communication Subsystem
panics when it receives messages for an unknown service. Updates volstat utility and kernel to
report cluster-wide LSM statistics. Add support in cluster alias to handle socket
unlisten.
PATCH 28002.00TCRMAN540 Provides the latest reference pages for sys_attrs_cfs(5), sys_attrs_clubase(5), and sys_attrs_rdg(5). Updates clu_alias.config(4) and exports.aliases(4) reference pages. Updates sys_attrs_icsnet(5) reference page to reflect
icsnet_mtu attribute. Updates the following reference pages: clua_services(4), cfsd.conf(4), sys_attrs_ics_ll_tcp(5) Updates the following reference pages: imcs(1), dlm_rd_collect(3), dlm_rd_validate(3), imc_rderrcnt(3), sys_attrs_cms(5), sys_attrs_drd(5), sys_attrs_icsnet(5)
Patches Delivered in Previous Kits |  |
The following TruCluster Server patches were
delivered in previous Version 5.1B patch kits. These patches will be
installed on your system if you did not install the previous
kit. Patch 27001.00TCRBASE540 Eliminates numerous panics and hung devices by fixing drd so
it no longer accesses a device that has a deletion pending or in
progress. Fixes an RM simple lock timeout issue that may occur in
noisy Memory channel rails. Enhances the error message generated when the clu_bdmgr
command cannot access a member boot disk. Fixes a configuration issue found in non-CAM devices and
CD_ROM devices. Fixes the cause of potential cluster hangs during some
Memory Channel hardware failures that result in an MC rail
failover. Fixes the CFS AIO write error path so the I/O completion
steps are not repeated. Fixes a flaw in CFS file locking code that causes a "vrele:
bad ref count" panic. Fixes the cause of an assertion failure in
cfs_vnops.c. Corrects a problem in which the simultaneous booting of
multiple nodes results in a panic due to an unknown node in a
remote member node list. Corrects a problem in a Memory Channel cluster in which a
panic occurs in a booted member when a booting member goes down
because of panic/halt/shutdown. Fixes a problem in which a thread enters dio code while an
extent map is being refreshed. Fixes a problem of v_numoutput not decremented for aio dio
error paths. Removes the cause of a panic that may occur in CFS at boot
time if a remote node goes down. Corrects several ICS signal-forwarding issues. Fixes a race between the close system call for a block
device file and the recovery process for the file system. Clarifies a usage message seen with the cfsstat
command. Corrects a problem in clu_mibs daemon that can cause various
eSNMP sub agents, such as pmgrd and os_mibs, to terminate. Fixes a problem to prevent the relocation of UFS read/write
file system to the original node. Provides new option to the mountd daemon to specify a port
number for mountd to bind to. Corrects a problem in which a DRD event thread may run
infinitely while responding for bid server transaction. Fixes and AdvFS domain panic caused by cfsd. Corrects a problem in CAA in which a resource does not fail
over when two resources have the same values for the
FAILOVER_DELAY and REQUIRED_RESOURCES attributes. Fixes a hang during cluster bootup caused by early
reservation conflicts. Provides enhancements to the caa_relocate command. Provides a new command, clu_ping, to determine the status of
the interconnects in a stretched cluster environment. Improves CFS client writing to do the
following: reduce the logging of ERROR 69 for user disk space
quota exhausted. support partial write success. increase the interconnect transfer size for multi-page
synchronous writes. prevent read ahead past the end of a file.
Helps ensure more accurate block reservation accounting in
CFS. Addresses an issue seen on Tru64 UNIX LAN clusters, whereby
a booting node may panic with "lock_wait" while spawning threads
for cluster interconnect channels. Provides a solution to display a warning message if deleting
a particular cluster member would cease NTP services for the rest
of the cluster. Improves the routing fail-over mechanism when one or more
network interfaces on any cluster member fails. Fixes a "kernel memory fault" panic in
cfs_fo_failover_done(). Fixes a problem wherein the DRD subsystem may cause a system
panic when strategy routines are called from a light weight
context (LWC). Fixes display errors in the cfsstat command when using the
icschanbps option. Fixes display errors in cfsstat command when using the
icschanbps option. Fixes a deadlock issue between cluster nodes because of
cfs_async_io_thread running on them. Corrects an erroneous error message displayed by
drdmgr. Fixes a cnx_qdisk_thread hang problem. Fixes a memory leak in CFS. Fixes disk I/O hang in DRD. Fixes a hang with disklabel that occurs if a local open
fails for the same disk simultaneously. Fixes incorrect CFS token structure warnings. Prevents file inconsistency due to a race between lookup and
remove. Provides a new cluster-specific link aggregation
distribution algorithm when using LAG in a LAN cluster. Fixes a simple lock timeout panic issue in kch and a
possible hang at boot time Prevents an AIO DirectIO to return invalid data while
reading a fragged file. Fixes a cluster hang issue during cluster boot-up, when
local disk open operations fail while disklabel is in
progress. Fixes an error in the DRD subsystem wherein un-initialized
disk attributes can cause a system panic. Fixes KMF in rdg_get_completion() routine. Fixes a problem in which a cluster alias subsystem tries to
free the mbuf that is already freed by ICS subsystem. Corrects reference counting issues within the DRD subsystem
that can prevent the deletion of hwids. Adds a new option, custom_gated, to cluamgr and
aliasd. Fixes a deadlock that can happen during failover of global
root and var file systems when vfast is enabled on them. Fixes resource leaks seen after a locked device file is
revoked. Fixes system panics seen on relocating file systems with
locked revoked devices. Fixes a problem with CAA placement policy when host names in
"HOSTING MEMBERS" are in uppercase letters. Corrects a problem in which CAA is incorrectly showing the
status of network resources on a halted member. Fixes a problem in cfs block reservation code where cfs
attempted to release a lock more than once. Introduces a code tracing capability of the aliasd and
aliasd_niff daemons to improve troubleshooting. Prevents a race that can occur during the planned relocation
of a file system. Improves the reliability of the DRD subsystem when faced
with tape devices and tape device failures. Introduces a mechanism to improve reliability for
synchronizing cluster alias ID sets among cluster members. Fixes the cause of the following CNX panic in cluster
reconfiguration: | cnx_change_cluster_tx_state: illegal transaction
state |
Fixes an ICS panic issue that occurs early in the boot
process.
Fixes a problem that causes the cluster alias
manager SUITlet to falsely interpret any cluster alias with
virtual={t|f} configured as a virtual alias regardless of its
actual setting. Corrects problems in which SysMan drdmgr dumps
tcl stack when a user tries to manage devices or file systems of a
cluster node that is down. Corrects an issue to allow the Device Request
Dispatcher, DRD, to retry to get disk attributes when EINPROGRESS
is returned from the disk driver. Address issues with “address already in use”
messages from klogin and kshell.
Corrects a potential security vulnerability in
CAA. Fixes a kernel memory fault. Corrects a problem in which the MC-API call
imc_ckerrcnt_mr()incorrectly returns an error status, although the
functions error count parameter is not increasing. Preserves the error code from an asynchronous
write error on a CFS client and returns the error from the close()
system call. Fixes a Distributed Lock Manager panic when
calling the dlm_get_lkinfo() routine passing an lkid of a lock
block that has already been declared dead by the deadlock
detection thread. Corrects a problem to allow the use of 255 in
the LAN Interconnect IP address. Fixes a CFS client panic during a file system
read operation where the server goes down. and the client itself
becomes the server and attempts to release the direct I/O token
that had already been released. Fixes a forced unmount of nonfailoverable file
system (that is, NFS and AutoFS) panic in the case that the
initiator is down. Enables a cluster to boot even if the cluster
root domain devices are private to different cluster members.
Although this is not a recommended configuration, it should not
result in an unbootable cluster. Currently, this is with respect
to cluster root domains not under LSM control. Corrects a potential data inconsistency caused
by a problem in the CFS block reservation code, which calculates
incorrectly the amount of space requested and used by direct I/O
writes. Resolves a kernel memory fault in
m_copym. Fixes a problem with the -b option of
caa_report. Fixes a problem with caa_stop -f by allowing
the administrator to reset a resource state from UNKNOWN to
OFFLINE even if the hosting member is down. Corrects a potential data inconsistency that
may occur when a domain is nearly full. Client write requests
shipped synchronously to the server will no longer have subsets of
pages written asynchronously due to a race with virtual
memory. Improves the scaling of IP reassembly code on
large SMP machines. NFS servers are especially susceptible when a
large number of clients attempt to write at the same time. Helps to close a race where synchronous writes
may obtain disk allocations that were promised to cached client
writes. Fixes a problem in which CAA might prevent
alias based services from properly functioning by binding to one
the cluster alias reserved ports. Corrects a problem in a Memory Channel cluster
where rebooting a node without performing a hardware reset can
crash other members with a RM_AUDIT_ACK_BLOCK panic. Fixes a problem in the Memory Channel
driver. Improves the responsiveness of EINPROGRESS
handling during the issuing of I/O barriers by removing a possible
infinite loop scenario that could occur due to the deletion of a
storage device. Fixes a problem that causes a panic with the
message "CNX MGR: Invalid configuration for cluster seq disk"
during simultaneous booting of cluster nodes. Fixes the panic "CNX MGR: Invalid
configuration for cluster seq disk" that occurs during the
simultaneous booting of cluster nodes. Fixes a possible race condition between a SCSI
reservation conflict and an I/O drain that can result in a
hang. Alleviates a condition in which a cluster
member takes an extremely long time to boot when using LSM. Fixes a problem that caa_relocate AutoFS does
not kill the autofsd daemon. Allows rewrites when the domain is close to
out of space. Ensures correct processing in the close()
system call. Provides a CAA action script that can be used
by a NIS Slave running to help assign a crontab entry to update
NIS maps. Fixes a problem in which a cluster member
leaves the cluster alias yet continues to respond to it. Corrects a problem that causes applications
(including cluamgr) to get a dummy cluster alias reported from the
cluaioc_get_nextalias() call. The IP address for this alias is
0.0.0.0. Fixes a problem in which aliasd creates
multiple similar virtual subnet static routes in the
gated.conf.memberX, thereby causing gated to fail to load. Fixes issues associated with the
initialization of the Memory Channel driver. Provides a function to query the status of
aliasd. Fixes an IPv6 bind problem in a cluster
environment. Fixes multiple disable or enable problems with
cluamgr. Fixes a tok_wait hang problem on Sierra
Clusters. Adds the ability to change the default
interconnect interface name. Corrects several problems in the cluster
install and upgrade utilities. Fixes a problem in which an RDG (Reliable
DataGram) kernel thread can starve other timeshare threads on a
uniprocessor cluster member. In particular, system services such
as networking threads can be affected. Fixes minor issues with cfsstat command-line
options and return values. Prevents panics seen with cluster server-only
(for example, MFS) mounts. Fixes a condition that causes the panic
pg_nwriters going negative when ubc_page_release() is called from
cfs_getpage(). Corrects a problem in the RDG component in
which multiple Oracle instances are unable to be properly
configured when using RDG over a LAN rather than Memory
Channel. Provides a sticky connection feature for a
cluster alias. Updates sysconfig to use the cluster
interconnect, allowing for a greater SSI collaboration. This will
help with changing variables on hung systems, single user systems,
and normal running systems. Improves device error processing in
drd. Corrects a boot hang problem seen on
large-scale Sierra Cluster configurations caused by a missed wake
up in the kernel group services code. Alters the behavior of the cluster NFS client
with TCP mounts so that when a remote server is down, the cluster
NFS client will use nonreserved ports to see if the remote server
is up. Introduces a new CFS tunable attribute that
may benefit the performance of client reads of clone files under
certain circumstances. Addresses an assertion caused by a bad user
pointer passed to the kernel via sys_call. Corrects a condition that results in excessive
context switching and CPU load due to a heavy use of the cluster
alias on large SMP and NUMA machines . Enhances /sbin/advfs/tag2name to print out the
name of the associated directory, given the tag of an index
file. Increases performance scalability and extends
the reliability of the Internode Communications Subsystem in a
cluster configured with Memory Channel as the cluster
interconnect. Improves detection of possible race conditions
during CFS recovery. Adds a cluster panic facility to the
kernel. Addresses the following: An issue in which new ICS server daemons and handles
are created one at a time each time the low water mark for
each is reached, thereby causing a nanny daemon to be called
more frequently than it needs to. An issue in which no mechanism exists for the user to
adjust the high and low water marks for ICS free handles,
which can result in poor performance during rapidly
increasing loads.
Fixes a problem in which cluster alias
connections are not distributed among cluster members according to
the defined selection weight. Fixes a memory leak in the cluster alias
subsystem. Fixes an issue with ICS (Internode
Communication Services) on a NUMA-based system in a
cluster. Fixes a problem in the cluster kernel in which
a cluster member panics while doing remote I/O over the
interconnect. Fixes a hang that occurs when multiple nodes
are shutting down simultaneously; fixes a Cluster File System
panic that occurs when using raw Asynchronous I/O; and provides
additional code to assist in problem diagnosis. Corrects a problem in which a panic displaying
the message “error CNX MGR: cnx_comm_error: invalid node state”
occurs on a LAN cluster running under load when other members are
rebooting. Addresses an error in which caa_register -u
produces with no balance data. Addresses a resource inaccessibility issue
that can occur if the hosting member crashes during a remote
caa_stop operation. Updates the attributes on a directory when
files are removed by a cluster node that is not the file system
server. Fixes a problem associated with non-SCSI
storage. Corrects a potential security vulnerability in
the cluster interconnect security configuration that may result in
a denial of service (DoS) on systems running TruCluster Server
software. Causes UDP datagrams that do not come from the
correct port to be discarded. Addresses a node hang that occurs during the
testing of Memory Channel cable pulls. A cluster member may hang
when a Memory Channel cable is pulled, the node is taken down, the
cable is plugged back in, and the node is rebooted. Fixes a cluster deadlock that may occur during
a failover and recovery when direct I/O is in use. Fixes a race condition in the Device Request
Dispatcher. Corrects a condition that can cause excessive
FIDS_LOCK contention when a large number of files are using
system-based file locking. Fixes a problem with cfsd core dumping shortly
after startup if it is enabled or shortly after enabling it. The
problem fixed by this patch is only seen after applying a recent
dsfmgr patch. Corrects diagnostic code that could cause a
panic during a kernel boot. Eliminates a performance problem when a node
acting as CFS server of an NFS client file system is
write-appending to an external NFS server. Prevents a panic when an AutoFS file system is
auto-unmounted. Corrects the cause of a cluster member panic
with kernel memory fault when running nmap or nessus targeting at
the cluster alias. Resolves a problem in which the caa_register
command allows a CAA resource to be registered even when its
profile contains an unknown attribute. This fix prevents the
caa_register command from registering a resource with an unknown
attribute and will cause it to return an error message that
includes the unknown attribute information. Fixes a condition in which uptimes greater
than 100 percent are reported for resources by caa_report. Fixes a problem in which resources that never
started have an ending timestamp. Fixes a problem in which CAA dumps core when
trying to deal with cluster member ID 63. Fixes an problem where access to the quorum
disk can be lost if the quorum disk is on a parallel SCSI bus and
multiple bus resets are encountered. Relieves pressure on the CMS global DLM lock
by allowing AutoFS auto unmounts to back off. Fixes cfsmgr to properly return a failure
status when a relocation request has failed. Fixes a race condition where stale name cache
entries allow file access after file unlink. Corrects a problem in which cfsd will
terminate prematurely and core dump when a node leaves the cluster
very shortly after joining the cluster. Fixes a timing window during asynchronous
reads on a CFS client. Fixes a panic that may occur during an
unmount. Corrects several problems with various
installation commands and utilities. Fixes a memory leak in the clu_get_info
interface. Enhances cluster file system performance when
using file locks to coordinate file access. Causes the correct error message for freezefs
-q to be displayed on a non-AdvFS file system. Fixes a problem in one of the shipped rc
scripts whereby Oracle fails during startup on a clustered
system. Addresses a panic that occurs on a booting
node. Fixes a coding error, a memory leak, and a
deinitialization problem in the cluster interconnect networking
layer. Fixes a problem in the Device Request
Dispatcher. Provides clu_upgrade enhancements. Increases performance by reducing the lock
miss rate in the ics_mct_llnode_info_lock. Addresses the panic “Assert Failed:
(cp-c_flags & CDIRECTIO) = 0” in the cluster file
system. Corrects a problem where a CFS lookup for a
mount could leave stale state behind that could adversely affect
subsequent NFS operations. Fixes an internal problem in the kernel's
AdvFS, UFS, and NFS file systems where extended attributes with
extremely long names, greater than 247 characters, could not be
set on files. The new limit is 254 + a null string
terminator. Corrects problems with LSM disks and the
cluster quorum tools. When a member having LSM disks local to it
is down, the quorum tools fail to update quorum. This causes other
cluster commands to fail. Corrects a problem in which mounting on a
directory in a clone fileset fails with the message "Device
Busy." Prevents a Kernel Memory Fault Panic in some
cases where AdvFS administration commands are performed on a
mounted fileset of an inaccessible AdvFS domain. Fixes a problem in which CAAD might dump core
due to a race condition when multiple events to which it
subscribes arrive simultaneously. Improves the fragment gathering mechanism to
boost performance. Fixes panic problem when attempting to unload
clua.mod. Fixes a condition that causes a boot up panic
when ippport_userreserved is 1000 or less. Fixes a cfsmgr core dump when passing the
incorrect number of arguments upon force unmounting a served file
system. Fixes a problem in which a CFS client for a
file with a hole preceding a frag might drop the frag. Optimizes cluster file system lock recovery,
potentially speeding up the time required to failover a file
system to a new server. Corrects a condition in which superfluous
"rm_event, index too big" messages may appear on system
consoles. Addresses a panic that may occur when a node
is joining the cluster. A node recognizing the joining node panics
while it is trying to establish a preboot channel connection with
the peer node, causing the following message to be displayed on
the console or in /var/adm/messages: | panic (cpu x): ics_mct: rx conn 3 |
Corrects the LSM partition types in the CNX
partition of boot disk for the clu_partmgr utility. Modifies the aliasd daemon to include
interface aliases when determining whether or not an interface is
appropriate for use as the ARP address for a cluster alias when
selecting the proxy ARP master. Fixes the potential of multiple assert_wait
and timeout panics due to kernel EVM threads not properly
preempting. Fixes a problem in the Memory Channel
driver. Corrects a condition that occurs during a
rolling upgrade in which the clu_ifaccess script removes the tag
file for /etc/ifaccess and sends out a warning message. Forces a reboot to resolve communications
problems in a two node cluster rather than hang. Corrects lock acquires after mpsleep. Causes a rebuild delay remainder to be
minimally second. Allows the cluster to provide new functions to
the dupatch command before a member is rolled, and also provides a
mechanism for backing out the added functions. Addresses a memory leak in the Memory Channel
transport layer. Fixes a problem in which a system may panic
with a kernel memory fault when a device that is being opened by
one program is being deleted with the hwmgr utility. Fixes a condition that causes a panic when a
valid NFS packet with corrupted embedded length field is
received. Fixes a condition that causes an unnecessary
panic due to request connection deregistration with an invalid IP
address. Provides performance improvement for CFS
filesets mounted with the server_only option. A log sync for
create transactions is not needed for such filesets. Fixes a problem with single physical rail
Memory Channel configurations and cleans up stale data left on an
off-line physical rail by the Memory Channel driver. Fixes a rare cluster hang caused by dead locks
that occurred between the CFS client and server during multiple
write operations. Fixes multiple problems seen with the
TruCluster RDG component, including panics of the following types
"rdg: unwiring", "vl_unwire: page is not wired", and "KMF: from
_otsmove." Allows users to add new members and create a
cluster with different netmasks. Removes member0-specific installation files on
an undo install, which could prevent the reinstallation of the
patch. Allows users to continue forward when they add
a member to a one-node cluster during a rolling upgrade or rolling
patch. Enables CAA to start up and fail over system
services before any of the user services. Fixes an unaligned kernel access in the
cluster I/O stack. Addresses a potential hang in the NFS server
that occurs when file systems are being relocated in a
cluster. Provides the ability to lower the
cluster_rebuild_delay. Fixes the long delay during an NFS connection
failover when servicing cluster member dies. Fixes a panic in clua.mod that is caused by
receiving a delete-cnx-request from a member when that cnx is in
the UNREGISTER state. Fixes a reconnection problem when an interface
comes down and then goes up. Fixes a panic problem in clua.mod that occurs
when max_aliasid is increased and aliases are added. Fixes a situation that causes a core dump in
aliasd when all interfaces are removed on a cluster member that is
set up with at least one cluster alias that was added with
virtual=t and without a subnet. Fixes a problem when disabling and re-enabling
cluster alias source route on a given interface. Fixes a problem where clua.mod does not handle
TCP RST messages appropriately. Fixes a problem of restoring static routes
when an interface revives. Corrects a problem in which a rolling upgrade
stops advancing when adding a cluster member to a one-node
cluster. Fixes an initialization issue with the
internode communications subsystem. Corrects a problem in which a domain panic on
the cluster_root does not result as it should in a regular panic
for the cluster node on which the domain panic occurs. Fixes several small issues with
clu_upgrade: A "process not found" message displayed when finishing
the setup stage of clu_upgrade has been removed. The ability to roll on a one-node cluster is
maintained.
Addresses a problem on LAN clusters related to
improper keep-alive timeouts that can be identified when the
following console message is displayed during normal operations
(that is, no know failures and no nodes are
rebooting): WARNING: ics_socket_event: error 60 on channel 0,
assume node # is down
Fixes a problem that occurs when the
interconnect is configured using NetRAIN, cluster_rebuild_delay is
set significantly below the default value, and members are
rebooting or failures are occurring on the active links. The
console message seen when this occurs is “CNX QDISK: Yielding to
foreign owner with provisional quorum.” Fixes a problem in which I/O barriers may be
stalled when a drive becomes hung. Prevents write failures from a cluster NFS
client that may occur when a second user without write access is
concurrently reading the file. Fixes a problem that occurs during reboots on
heavily loaded cluster using the LAN interconnect and generates
the following messages: WARNING: ics_socket_event: error 54 on channel
0 WARNING: ics_socket_event: error 60 on channel
0
Fixes kmf in
drd_kgs_bid_stop_server_io_drained when a node leaves during a drd
kgs transaction. Corrects a problem in which drd continually
tries to perform a munsa unreject on the drive when a device is
deleted while it is in the munsa reject state. Corrects a problem in which multiple path
failures cause drd to return ENODEV even when a server is
available in the cluster. Fixes several error handling in drd for device
error conditions. Fixes problem in which a device cannot be
opened due to heavy load on the device. Fixes a problem in which a CD-ROM is not
mountable in a cluster. Fixes loss of quorum disk. Makes quorum disk parameters
configurable. Eliminates a window for kernel memory fault
panics on AdvFS system calls that are performed via function
shipping using the clu_msfs_syscall_fship routine. Fixes a Sierra Cluster KCH set free race
condition. Fixes two errors in clu_upgrade that prevents
completing the setup stage. Prevents a get_cs_toks() KMF/assert
crash. Fixes a rm_audit_sync_block panic that occurs
when using a long fiber as the Memory Channel interconnect. Fixes a timing window in the Internode
Communications Subsystem ddr device error handling. Fixes the rm_audit_sync_block panic when using
a long fiber with VHUB as the Memory Channel interconnect. Fixes clu_bdmgr to facilitate CLSM sliced
disks for cluster_root domain. Modifies the manner of checking for user file
limits for CFS remote DIO writes. Ensures that signals for EFBIG writes are
properly generated on a client. Ensures the correct processing of CFS in
future releases. Fixes a multiple free problem of 32-byte
memory bucket caused by multiple callbacks from KCH to
CLUA. Fixes an incorrect if statement, which
although a low- risk problem, could block access to a disk
device. Corrects a confusing error message. Fixes a problem seen in a LAN cluster when the
CPUs on a member system are not installed contiguously in the
lower order slots. Allows the quorum disk to be used in spite of
transient errors with the quorum disk hardware. Corrects an internal logic error that causes
the performance of file deletion to be suboptimal. Fixes a deadlock that occurs when no members
have valid paths to a device and all the nodes in the cluster are
attempting failover at the same time. Fixes problems seen in the TruCluster RDG
component. Fixes a race condition in a routine that
allocates memory for Memory Channel logical rail and physical rail
use. It prevents a KMF during boot, occasionally seen on some
AlphaServer GS1280 systems. Fixes a race condition which leads to a panic
that occurs when a device is deleted on a busy system. Adds the ability to log enabled DRD events to
circular memory buffer. Corrects an Invalid Current Server
panic. Increases tolerance for intermittent disk boot
disk errors early in the boot process. Corrects a problem in which I/O operations
hang when I/O barriers fail due to the loss of access to
drives. Fixes a TruCluster NFS server failure that
occurs when clients access file systems forcibly removed with the
cfsmgr -u command. Fixes an incorrect return status for
asynchronous direct I/O reads in a cluster if the read request
goes beyond the end of the file (EOF). Fixes the problem of unintentional loading of
gated when nogated is specified with other requested cluamgr
operations. Fixes a problem in which backplane RAID
devices can become inaccessible. Provides the following tape-related
fixes: Corrects a problem in which hwmgr redirect commands
fail on tape devices. Prevents the reuse of a dsk number upon deleting and
adding a new tape. Corrects a problem in which drdmgr commands can hang
on tapes. Updates the code base to make failbacks more
proactive.
Improves defenses against user error during
the roll stage of rolling upgrade. Fixes TruCluster Distributed Lock Manager
(dlm) system panic due to lock transaction ID's being out of synch
after a rebuild. Corrects a problem in which the TruCluster
component DRD (Device Request Dispatcher) does not always return
standard error codes. Prevents a kernel memory fault panic when
drd_open is called on a device with a valid local path that has no
local devt passed in, and this member has the lowest cluster ID of
any member in the cluster. Prevents CFS token sequence number reuse
errors on fast systems. Prevent domain panic on a file system that is
local to a failed cluster member. Prevent CFS write() from updating file access
time or panicking on a directory. Modifies the way the clu_upgrade command
behaves regarding the availability of backup space in the setup
and preinstall stages and adds an appropriate error
message. Corrects a problem within the TruCluster
Kernel Group Services (kgs/kch) subsystem in which the
simultaneous booting of multiple nodes may result in a panic due
to an unknown node in a remote member node list. Removes a delay in the TruCluster component
DRD (Device Request Dispatcher) event threads during system
booting. Corrects a kernel memory fault in
drd_local_device_close. Fixes a kernel memory fault issue on LAN-based
clusters that do not have a Memory Channel adapter installed on
the systems. Fixes problem of non-root users not being able
to execute the caa_stat command. Provides enhancements to CAA commands and the
caad daemon. Resolves a resource exhaustion problem in the
TruCluster kgs/kch subsystem on high-end clusters, typically with
large storage configurations. Fixes an assert failure in cfs the
server. Resolves a problem that occurs when adjusting
sysconfig clua attributes sticky_entry_timeout and
sticky_db_cleanup_interval. Ensures that if only a portion of an AIO/DIO
write completes, the correct number of bytes written will be
returned. Allows CFS to correctly handle a token race
condition without creating a panic. Prevents a node in a cluster from hanging at
boot time. Corrects misspellings of file system in the
cfsmgr utility. Implements the fast fail policy within
DRD. Corrects a problem in which backplane RAID
devices can become inaccessible when installed on systems running
Version 5.1B-2 (Patch Kit 4). Enhances the fuser command to provide
cluster-wide query capability. Ensures that the number of icsmct receive
threads does not exceed the number of CPUs. Corrects a condition in which
drd_get_disk_attributes hang if too many errors are encountered,
causing new devices to be inaccessible in a cluster from some
cluster members. Corrects a problem in which a cluster CFS
client would panic in cfscall_writepages, reporting ASSERT (error
!= EDQUOT) . This correction eliminates that failure and allows
for the proper writing up to the fileset quota and to the end of
space for a domain. Fixes a rare, three-way deadlock condition
when Internode Communication Services (ICS) traffic is in a
throttled state and a cluster member that is participating in the
throttled traffic is halted. Fixes a kernel memory fault in strlen on a
cluster member during a mount of an AdvFS files ystem with an
improperly specified file system. Allows the ulimit -f command to function
correctly in a cluster. Prevents a kernel memory fault panic that may
occur with client writes on nearly full domains. Prevents a panic on a device close when device
connectivity is lost. Fixes a mounting KMF of partitioning file
system in a cluster. Fixes a problem in which a CAA resource and
its dependents become inaccessible when the resource fails to
start on the node where it is failed over to and there are no more
nodes to consider for failover. Fixes Oracle socket connection problem. Fixes incorrect error handling that could
result in memory leak. Provides event definitions for traps in
cluster MIB files to support Openview NMS. Modifies ics_tcp to check response buffer for
NULL before freeing it. Fixes a problem in which booting times in
excess of 2 hours occur in a two-node LAN cluster using an ee
(DE6xx) adapter as the cluster interconnect and connected directly
by a crossover cable. Corrects a scenario during a cluster member
boot whereby a booting member may cause booted members to panic on
a kernel memory fault shortly after the messages "Registering CMS
Services" and the "rm slave" are printed to the booting console
for each MC card. Fixes a problem that could cause the system
panic "clua_realloc_port: corrupt list pointers panic". Corrects trapOID for traps generated from the
clu_mibs subagent and provides event definitions for traps in MIB
files to support Openview. Fixes an inappropriate message that is
displayed during CAA resource relocation when invoked from
SysMan. Fixes 64-byte memory leak in the drd/kgs
interface. Modifies CNX to check for communication errors
while a node joins the cluster. Fixes a synchronization issue with a cluster
alias ID set among cluster members. Prevents a panic from occurring during a
failover mount if the AdvFS on-disk file system ID (fsid) does not
match the current cluster-wide fsid for the file system. Fixes an intermittent core issue in the aliasd
daemon caused by improper handling of the interface list. Fixes an assertion panic
"set-num_rmt_mbr_nodes = 0". Prevents a single-node panic in a cluster than
can occur under the following conditions: A memory file system of size 4GB or greater is created
with the default 512-byte sector size. A memory file system of size 2GB or greater is created
with a 1024-byte sector size and other sector sizes.
Prevents a kernel memory fault panic that may
be seen under certain error conditions with MFS file
systems. Corrects a problem with kch memory
usage.
Patch 27002.00TCRMAN540 Provides a new command, clu_ping, to determine
the status of the interconnects in a stretched cluster
environment. Updates the caa_relocate(8) and cluamgr(8)
reference pages.
Revises the clua_services(4) and
sys_attrs_clua(5) TruCluster reference pages.
|