| DESCRIPTION This Engineering Advisory supersedes
previous EA BU040915_EW02.
A number of Tru64 UNIX customers have reported system
crashes while running Tru64 UNIX 5.1B-1/PK3 and 5.1B-2/PK4 on
platforms with 'Big Pages' enabled. Please note that the 'Big Pages'
feature is, by default, disabled on Tru64 UNIX, so most users will
not encounter this situation. We are actively working to determine
the root cause and to provide a solution.
If your customer encounters this problem they should immediately
enter a problem report. In such cases, it is essential that L1 and
L2 support engineering provide a full sys_check -escalate at the
time of the crash. Please refer to the Tru64 Engineering case
submission template for the location to forward all related case
files.
There are two types of hash failed problems that we have been
working on:
- ubc_wire: hash failed
- ubc_bigpage_alloc: hash failed
We have workarounds for each.
For the first type of hash failure (ubc_wire: hash failed) the
system is trying to deal with replicated pages on a non-NUMA system.
A valid workaround is to disable page replication for user code,
which is done by changing the following sysconfig parameter and
rebooting:
vm:
replicate_user_text = 0
This workaround should only be applied to non-NUMA systems. There
should be no negative performance impact. The only expected outcome
of this change is no more panics of this nature. There have been no
reports of this failure on Sierra clusters but there is no risk to
setting this tunable on Sierra systems.
For the other panic (ubc_bigpage_alloc: hash failed), we recommend
disabling the segmentation tunable within Big Pages. This action
also allows users to avoid a kernel memory fault from
pmap_remove_all().
In sysconfig, make the following changes, then reboot:
vm:
vm_bigpg_enabled = 1
vm_bigpg_seg = 0
Target Audience: Any customer running 5.1B-1/PK3 or 5.1B-2/PK4. |