 |
Index for Section 7 |
|
 |
Alphabetical listing for D |
|
 |
Bottom of page |
|
drd(7)
NAME
drd - Distributed raw disk (DRD) device driver (provided on Production
Server configurations only)
DESCRIPTION
The drd driver is a pseudodevice driver that runs in Production Server
configurations. By providing an abstraction of physical storage, the drd
driver allows user-level applications to work without specific knowledge of
where within a cluster the underlying physical device resides.
The asemgr, drd_mknod, drd_ivp, and drd_balance utilities provide the
system management interface to the DRD subsystem. Use the asemgr utility
to configure DRD services and indicate their service policies. The
available server environment (ASE) selects a server for a DRD service based
on its service policy and starts the service on the server node. The
server node has physical connectivity to the DRD device participating in
the service, and issues requests to the underlying device driver. Other
nodes within the cluster that utilize the DRD devices in a DRD service are
called client nodes.
The drd driver on client and server nodes receives user requests through
conventional system calls such as open, close, read, write, and ioctl. For
this reason, the driver is considered to be a raw (or character) device
driver. Because it relies on an underlying physical device driver to
control the disk device, the drd driver is also considered a pseudodevice
driver.
When the drd driver receives a user request, it first determines whether
the node on which it is running is the server of the physical device that
is the object of the request as follows:
· If the node that receives the user request is serving the physical
device that is the object of the request, the drd driver considers the
request to be a local request. The drd driver passes the local
request to the underlying physical device driver, such as the SCSI CAM
driver (see rz(7)) or the Logical Storage Manager (LSM) (see
volintro(8)).
· If the node that receives the user request is not serving the physical
device that is the object of the request, the drd driver considers the
request to be a remote request. The drd driver passes the remote
request across the network transport to the other node that is the
device's server node. The server node passes the request to the
underlying physical device driver. When the local physical device
driver completes the request, the server node returns the results and
status to the client node. The client node returns the results and
status to the calling user-level program.
Attributes
You can tune the performance of the DRD subsystem by setting one or more
attributes in the /etc/sysconfigtab file. The default settings of these
attributes should be sufficient for most applications.
The following command shows the current settings of these attributes: #
sysconfig -q drd
Note
Although the complete set of DRD attributes are described in this
section, most are reserved for debugging, development, and testing
purposes. These reserved attributes are indicated by the phrase
<tuning not supported>. Modifying the default setting of these
attributes is not supported.
Some of these attributes (unless marked otherwise) can be dynamically
reconfigured. You can specify them in the sysconfig command, changing the
drd: stanza entry of the /etc/sysconfigtab file, and alter the behavior of
the DRD subsystem on a running system without needing to reboot the system.
For example, to change the value of the drd-print-info attribute to 1,
enter the following command: # sysconfig -r drd drd-print-info=1
drd-print-info: reconfigured
You can use the configuration manager framework, as described in the Tru64
UNIX System Administration manual, to change attributes and otherwise
administer the DRD subsystem on another host. To do this, set up the host
names in the /etc/cfgmgr.auth file on the remote client system and then
specify the -h flag to /sbin/sysconfig, as in the following example: #
sysconfig -h fcbra13 -r drd drd-do-local-io=0
drd-do-local-io: reconfigured
The following DRD attributes modify the operational behavior of the DRD
subsystem. They are for system-testing purposes only and should not be
modified.
drd-noop-open
By default, the DRD driver keeps the device open for the duration
of service. It discards all open operations. To force an open
call to be passed to the driver, specify drd-noop-open = 1.
<tuning not supported>
drd-noop-close
By default, the DRD driver discards all close operations. To
force a close call to be passed to the driver, specify drd-noop-
close = 1. <tuning not supported>
drd-open-key
When the value of this attribute is 1, all open calls are tested
to ensure that they have specified the open flag (O_DRD). This
flag allows applications to restrict DRD usage. <tuning not
supported>
drd-broadcast-mc-only
The various cluster members use a network broadcast mechanism to
determine the DRD map entries. The DRD map entries define which
node within the cluster is the block shipping service (BSS)
server of a specific disk device. When this attribute is
nonzero, the broadcasts are transmitted only on the Memory
Channel network interfaces. In this manner, broadcast activity
is constrained to the cluster interconnect. <tuning not
supported>
drd-do-broadcast
When this attribute is nonzero, the DRD map specification code
broadcasts new map entries on server nodes. In this manner,
client nodes are informed of which node is the server of a
specified disk. <tuning not supported>
drd-accept-mc-maps
When this attribute is nonzero, the DRD subsystem accepts all map
entries that were broadcast over the Memory Channel subnet from
the BSS server nodes. When this attribute is zero (0), only map
entries from a list of trusted nodes are accepted. <tuning not
supported>
drd-accept-all-maps
When this attribute is nonzero, the DRD subsystem accepts all map
entries that were broadcast over any network interconnect.
<tuning not supported>
drd-mc-iodone-inline-all
When this attribute is nonzero, BSS command completion tasks for
both read and write operations are performed in the context of
the underlying physical device driver's iodone completion code.
The setting of this attribute supersedes that of the drd-mc-
iodone-inline-writes attribute. <tuning not supported>
drd-mc-iodone-inline-writes
When this attribute is nonzero, the BSS command completion tasks
for write operations are performed in the context of the
underlying physical device driver's iodone completion code. The
setting of this attribute is superseded by that of the drd-mc-
iodone-inline-all attribute. <tuning not supported>
drd-mc-iodone-nthreads
Specifies the number of BSS iodone completion threads to run on
this member system. An iodone completion thread is a kernel
thread within the DRD subsystem that performs BSS completion
operations. It runs, based on a trigger in the iodone completion
path of the underlying physical device driver. <tuning not
supported>
drd-bss-rm-peer2peer
When this attribute is nonzero, it enables peer-to-peer direct-
memory-access (DMA) between the host's SCSI storage controller
and the Memory Channel adapter on the same PCI bus. When enabled
on a cluster member system, peer-to-peer DMA is used for all
remote DRD read requests processed by that member system as a DRD
server. Data read in response to the request is sent directly to
the Memory Channel adapter from the disk controller, which avoids
an intermediate transfer by way of main memory.
This attribute is set at boot time by the drd_dma utility, as
long as all SCSI controllers and Memory Channel adapters used by
the DRD server are on the same PCI bus. If you manually set this
attribute using the sysconfig command, you must ensure that this
configuration requirement is met and that no DRD disks are
currently active. If DRD disks are active at the time you enter
the command, the system may panic.
The drd-bss-rm-peer2peer attribute is incompatible with the
drd_data_compare attribute. A request to enable the drd-bss-rm-
peer2peer attribute will fail if data checksumming over the
Memory Channel interconnect is enabled within the DRD subsystem.
drd-bss-p2p-root-allowed
When this attribute is nonzero, it disables the check in the DRD
pseudodevice driver that prevents the creation of DRD services
using disks on the same SCSI bus on which the disk that holds the
root file system resides. Compaq does not support this type of
configuration. Setting this attribute may prevent the enabling of
peer-to-peer DMA on certain member systems. <tuning not
supported>
drd-suspend
When this attribute is reconfigured to a nonzero value, it
suspends all DRD I/O operations. When DRD operations are
suspended and this attribute is reconfigured to 0 (zero), DRD I/O
operations are resumed. When this attribute is queried, a
nonzero value indicates that the DRD subsystem's operations have
been suspended.
The following DRD attributes are used to debug the DRD subsystem (for
example, by forcing error conditions or modifying the frequency of display
messages):
drd-print-info
When this attribute is nonzero, informational debug messages are
displayed on the system console. Setting this attribute to a
value greater than 1 causes additional low-priority messages to
be displayed.
drd-print-warn
When this attribute is nonzero, warning debug messages are
displayed on the system console.
drd-mc-print-info
When this attribute is nonzero, informational messages for the
Memory Channel specific portions of DRD are displayed on the
system console. Setting this attribute to a value greater than 1
causes additional low-priority messages to be displayed.
drd-mc-print-warn
When this attribute is nonzero, informational and warning
messages for the Memory Channel specific portions of DRD are
displayed on the system console.
drd-data-compare
When this attribute is set to 1, 2, or 3, the DRD subsystem
performs a checksum of the data portion of read and write
requests. For proper operation, this attribute must be set to
the same value on all cluster members.
When this attribute is 0, no data check summing and comparisons
are performed.
When this attribute is 1, the bsc_stats.bsc_read_miscompares stat
counter is incremented on DRD client read miscompares and the
bss_stats.bss_write_miscompares stat counter is incremented on
DRD server write miscompares.
When this attribute is 2, the stat counters are incremented as
appropriate and one of the following error messages is written to
the console and kernel log files: bsc_do_unmap_RM: READ check sum
failure server = # client = # bsc_rm_docopyinout: READ checksum
failure server = # client = # bss_rm_server: WRITE checksum
failure client = # server = # When this attribute is 3, the stat
counters are incremented as appropriate, the pertinent messages
are written to the log files, and the system panics.
All cluster members must use the same drd-data-compare value.
Otherwise, some cluster members will not initialize the checksum
value, causing other members to erroneously report that data
corruption has occurred.
The drd-data-compare attribute is incompatible with the drd-bss-
rm-peer2peer attribute. A request to set drd-data-compare will
fail if peer-to-peer DMA is enabled. <tuning not supported>
drd-disk-drain
When this attribute is reconfigured (for example, by a sysconfig
-r drd drd-disk-drain=1 command), all pending I/O operations will
be completed before the reconfigure call returns.
drd-do-bss-hist
When this attribute is nonzero, a histogram is recorded on the
number of block shipping service daemon (bssd) threads. To print
this histogram on a DRD server, start the dbx debugger on a
system running the kernel and enter the p bss_sv_active_hist
command.
Use the -t flag to the drd_ivp utility to verify the setting of
the drd-do-bss-hist attribute.
drd-fail-local-io
When this debug testing attribute is nonzero, all local I/O
operations are returned with an [EIO] error status. <tuning not
supported>
drd-fail-remote-io
When this debug testing attribute is nonzero, all remote I/O
operations are returned with an [EIO] error status. <tuning not
supported>
drd-nomap-failover
Normally, when a block shipping client (BSC) client node receives
errors on the I/O requests, it deletes the map entry after
several retries. This is done so that subsequent retries will
obtain a new map entry from the server. When this debug
attribute is set, map entries are not deleted. <tuning not
supported>
drd-skip-dlm-chk
When this attribute is nonzero, the DRD subsystem will not ensure
that appropriate distributed lock manager (DLM) sequence numbers
have been specified in I/O requests. <tuning not supported>
drd-state Displays the current state of the DRD subsystem state flags.
This is a read-only attribute.
drd-major-version
Specifies the major version number of the DRD subsystem.
drd-minor-version
Specifies the minor version number of the DRD subsystem.
cmajnum Specifies the character driver major number for the DRD device
driver. The device special files in the /dev/rdrd directory are
created with a matching major number.
bmajnum Specifies the block driver major number for the DRD device
driver. This attribute is set to -1 when no block interface has
been configured.
Module_name
Contains a string that identifies the DRD subsystem ("Distributed
Raw Disk - TruCluster Subsystem").
The following DRD attributes are for performance analysis. Typically, they
cause I/O requests to return immediately, which avoids various calling
sequences.
drd-loopback
When this attribute is nonzero, the block shipping service/block
shipping client (BSS/BSC) code path is exercised locally via IP
loopback to debug on a single node. <tuning not supported>
drd-do-local-io
When this attribute is set to 1, all local disk requests are
passed to the underlying driver as expected. When this attribute
is set to zero (0) for performance analysis, the read and write
requests are not sent to the underlying driver. Rather, they are
returned immediately with a success status without performing the
actual I/O transfer. Setting this attribute to zero (0) does not
disable I/O requests received by a BSS; to do that, use the drd-
do-remote-io attribute. <tuning not supported>
drd-do-remote-io
Similar to the drd-do-local-io attribute, except that this
attribute applies all remote requests to the BSS. When this
attribute is set to zero (0), all client requests go out over the
wire, but, at the server side, the disk driver is not called and
a successful response is immediately returned. <tuning not
supported>
drd-usedrc
Tells the bssd daemon that it should use a duplicate request
cache (DRC). A DRC protects against the harmful replay of Remote
Procedure Call (RPC) requests in the unlikely event of network-
related problems. Using the DRC adds overhead to each normally
successful request; the bssd daemon processing the request
searches the DRC for a match before starting each operation, and
then copies the results. This attribute cannot be reconfigured
after the system is up and running. <tuning not supported>
drd-do-clnt-call
Used for performance analysis of the code path. When it is zero
(0), RPC calls are not sent remotely; they immediately return a
success status.
<tuning not supported>
drd-requeue
Controls the usage of a performance optimization in the BSS
server by governing how RPC requests are received. When set to a
nonzero value, this attribute causes the bssd daemons to be
awakened out of the netisr threads to process incoming requests.
When this attribute is zero (0), the bssd daemons are directly
dispatched, which avoids a context switch. This attribute is not
set by default because the DRD subsystem does not handle read and
write requests by RPC over the Memory Channel interconnect.
<tuning not supported>
drd-disable-mc
An optimized code path is used for DRD reads and writes. This
code path uses the Memory Channel interconnect, and bypasses the
UDP/IP code path used for conventional RPC requests. When this
attribute is set, the optimized code path is not used, and DRD
reads and writes are performed directly over the UDP/IP via RPC
calls. <tuning not supported>
drd-subsys-stat, drd-disk-stat, drd-que-stat
drd-remote-disk-stat, drd-io-size-stats
drd-map-query
Pass DRD subsystem performance statistics to performance
monitoring utilities. They contain binary data structures, which
are not displayed by conventional sysconfig commands. <tuning
not supported>
drd-mc-maxphys
Specifies the size of the largest single I/O transfer that is
supported when using the Memory Channel interconnect. I/O
requests that are larger than this value are fragmented. The
value of this attribute is determined by the corresponding device
driver maxphys interface of all supported underlying device
driver types. <tuning not supported>
drd-mc-iodone-nthreads-run
Shows the number of bssd iodone completion threads that are
running on the member system. This attribute can only be queried.
To change the number of bssd iodone completion threads on a
member, use the drd-mc-iodone-nthreads attribute.
drd-bss-rm-iodone-bind
Causes each bssd iodone completion thread to be bound to a
nonprimary CPU. When this attribute is nonzero, no iodone thread
will be run on the primary CPU. When this attribute is zero (0),
iodone threads may be scheduled on any available CPU. <tuning
not supported>
drd-io-size-enable
When this attribute is nonzero, the DRD kernel subsystem
maintains counters that describe the size of read and write
operations. By default, this parameter is set to zero (0) to
avoid the overhead of maintaining these counters.
The following DRD attributes are for tuning purposes. They may change the
number of retries or timeout intervals.
drd-bssd-busy
Specifies the percentage of time when all of the bssd daemons are
busy concurrently serving RPC requests. A value exceeding 10%
indicates that more bssd threads should be run. Use the -t flag
of the drd_ivp utility to verify the value of this attribute.
drd-maphash-size
Specifies the number of chains in the DRD subsystem's member/disk
map entry hash table. This value must be a power of 2 in the
range of 8 to 1024. Use the -t flag of the drd_ivp utility to
verify the value of this attribute.
The DRD subsystem describes which cluster member is serving each
disk in a set of map entries that are collected in a number of
chains in a hash table.
drd-max-hash-length
Specifies the length of the largest chain in the DRD subsystem's
member/disk map entry hash table. When this value exceeds 50
entries, increase the number of hash chains using the drd-
maphash-size attribute. Use the -t flag of the drd_ivp utility
to verify the value of this attribute.
drd-retry-seconds
Specifies the number of seconds that the DRD subsystem will wait
while trying to process a remote request before returning an "No
such device" error to the calling application. The value of this
attribute dictates how long the DRD subsystem waits to determine
which member is serving a requested disk before timing out. The
default value of this attribute results in a virtually infinite
waiting period and is sufficient for most applications. The
drd-retry-seconds attribute should be set to a lower value for a
configuration in which applications must be aware of a disk or
service failure.
drd-broadcast-query-retries
drd-broadcast-query-nsleeps
Specify the number of times the DRD subsystem will retry a remote
request and the time interval between retries. The drd-retry-
seconds attribute is the product of these two attributes.
<tuning not supported>
drd-base-timeo
Specifies the command timeout interval (in 1/10-second
increments) at the RPC level. To determine how much time is
spent retrying at the RPC level, multiply this attribute by the
value of the drd-rpc-retries attribute. <tuning not supported>
drd-mc-rd-segsize
Adjusts the size of each segment created to support read
operations over the Memory Channel interconnect.
The DRD subsystem allocates read and write segments in response
to the first remote I/O operation, and never allocates a segment
for an inactive cluster member. It creates a segment for each
type of operation (read and write) and for each client/server
pair in the cluster.
Specify this value as a number of bytes. If the number does not
represent an even multiple of 512 KB, it is rounded up to the
next multiple. Note that read segments are larger than write
segments to take advantage of the Memory Channel hardware's
ability to write directly to process space pages. Consequently,
if your application issues many I/O requests involving data
smaller than 8192 bytes and buffers that are not aligned on
8192-byte boundaries, you may need to allocate up to four times
more space for read segments than for write segments.
The DRD statistics monitoring utilities display the percentage of
remote Memory Channel read and write operations that are stalled
waiting for sufficient segment space to issue the request. If
these values exceed 10%, increase the segment by adjusting the
drd-mc-rd-segsize and drd-mc-wr-segsize attributes. After
modifying these attributes, you must reboot the system for them
to take effect.
Use the -t flag to the drd_ivp utility to verify the setting of
the drd-mc-rd-segsize and drd-mc-wr-segsize attributes.
drd-mc-wr-segsize
Adjusts the size of each segment created to support write
operations over the Memory Channel interconnect.
The DRD subsystem allocates read and write segments in response
to the first remote I/O operation, and never allocates a segment
for an inactive cluster member. It creates a segment for each
type of operation (read and write) and for each client/server
pair in the cluster.
Specify this value as a number of bytes. If the number does not
represent an even multiple of 512 KB, it is rounded up to the
next multiple.
The DRD statistics-monitoring utilities display the percentage of
remote Memory Channel read and write operations that are stalled
waiting for enough segment space to issue the request. If these
values exceed 10%, increase the segment by adjusting the drd-mc-
rd-segsize and drd-mc-wr-segsize attributes. After modifying
these attributes, you must reboot the system for them to take
effect.
Use the -t flag to the drd_ivp utility to verify the setting of
the drd-mc-rd-segsize and drd-mc-wr-segsize attributes.
drd-rpc-retries
Specifies the number of times failed commands are retried at the
RPC level. <tuning not supported>
drd-bp-low-water
Controls the behavior of a private memory pool of buf structures
used by the DRD subsystem.
drd-bp-high-water
Controls the behavior of a private memory pool of buf structures
used by the DRD subsystem.
drd-bp-increment
Controls the behavior of a private memory pool of buf structures
used by the DRD subsystem.
drd-bssd-max
Specifies the maximum number of bssd threads. The bssd command
specifies an attribute indicating how many threads to spawn.
That value cannot exceed the value of the drd-bssd-max attribute.
drd-bsc-biod-max
Specifies the maximum number of bsc_biod threads. The bsc_biod
command specifies an attribute indicating how many threads to
spawn. That value cannot exceed the value of the drd-bsc-biod-
max attribute.
The following DRD attributes are used to configure the DRD subsystem. They
can be neither queried nor tuned.
· Device_Block_Major
· Device_Char_Files
· Device_Char_Major
· Device_Char_Minor
· Device_Dir
· Device_Mode
· Module_Config1
· Module_Config_Name
· Module_Type
RESTRICTIONS
The supported set of generic disk ioctls include the following:
· DEVIOCGET
· DEVGETGEOM
· DEVGETINFO
· DIOCGDINFO
· DIOCSDINFO
· DIOCWDINFO
· DIOCWLABEL
· DIOCGDEFPT
· DIOCGCURPT
All disk-related maintenance ioctls (as used in the disklabel and scu
commands) must be issued to the underlying physical device driver. For
example, the scu command specifies /dev/rrz2c and not /dev/rdrd/drd4.
No bdevsw (block device) interfaces are provided. Consequently you cannot
mount a file system on a drd device.
The drd driver does not provide exclusive open semantics. This restriction
implies that you cannot use drd to remotely serve a tape device. Tape
device support is further precluded by the fact that drd may fragment read
and write requests based on the underlying network transport. When a
device is being served by drd, the subsystem will verify that it is a disk
type device.
FILE
/dev/rdrd/drdx
A DRD device special file (where x is a number assigned at the
time the asemgr utility configures the DRD service). The device
special file /dev/rdrd/drd0 is reserved for DRD control
functions.
RELATED INFORMATION
devio(7), asemgr(8), bsc_biod(8), bssd(8), drd_balance(8) drd_dma(8),
drd_ivp(8), drd_mknod(8)
 |
Index for Section 7 |
|
 |
Alphabetical listing for D |
|
 |
Top of page |
|