Index Index for
Section 7
Index Alphabetical
listing for D
Index Bottom of
page

drd(7)

NAME

drd - Distributed raw disk (DRD) device driver (provided on Production Server configurations only)

DESCRIPTION

The drd driver is a pseudodevice driver that runs in Production Server configurations. By providing an abstraction of physical storage, the drd driver allows user-level applications to work without specific knowledge of where within a cluster the underlying physical device resides. The asemgr, drd_mknod, drd_ivp, and drd_balance utilities provide the system management interface to the DRD subsystem. Use the asemgr utility to configure DRD services and indicate their service policies. The available server environment (ASE) selects a server for a DRD service based on its service policy and starts the service on the server node. The server node has physical connectivity to the DRD device participating in the service, and issues requests to the underlying device driver. Other nodes within the cluster that utilize the DRD devices in a DRD service are called client nodes. The drd driver on client and server nodes receives user requests through conventional system calls such as open, close, read, write, and ioctl. For this reason, the driver is considered to be a raw (or character) device driver. Because it relies on an underlying physical device driver to control the disk device, the drd driver is also considered a pseudodevice driver. When the drd driver receives a user request, it first determines whether the node on which it is running is the server of the physical device that is the object of the request as follows: · If the node that receives the user request is serving the physical device that is the object of the request, the drd driver considers the request to be a local request. The drd driver passes the local request to the underlying physical device driver, such as the SCSI CAM driver (see rz(7)) or the Logical Storage Manager (LSM) (see volintro(8)). · If the node that receives the user request is not serving the physical device that is the object of the request, the drd driver considers the request to be a remote request. The drd driver passes the remote request across the network transport to the other node that is the device's server node. The server node passes the request to the underlying physical device driver. When the local physical device driver completes the request, the server node returns the results and status to the client node. The client node returns the results and status to the calling user-level program. Attributes You can tune the performance of the DRD subsystem by setting one or more attributes in the /etc/sysconfigtab file. The default settings of these attributes should be sufficient for most applications. The following command shows the current settings of these attributes: # sysconfig -q drd Note Although the complete set of DRD attributes are described in this section, most are reserved for debugging, development, and testing purposes. These reserved attributes are indicated by the phrase <tuning not supported>. Modifying the default setting of these attributes is not supported. Some of these attributes (unless marked otherwise) can be dynamically reconfigured. You can specify them in the sysconfig command, changing the drd: stanza entry of the /etc/sysconfigtab file, and alter the behavior of the DRD subsystem on a running system without needing to reboot the system. For example, to change the value of the drd-print-info attribute to 1, enter the following command: # sysconfig -r drd drd-print-info=1 drd-print-info: reconfigured You can use the configuration manager framework, as described in the Tru64 UNIX System Administration manual, to change attributes and otherwise administer the DRD subsystem on another host. To do this, set up the host names in the /etc/cfgmgr.auth file on the remote client system and then specify the -h flag to /sbin/sysconfig, as in the following example: # sysconfig -h fcbra13 -r drd drd-do-local-io=0 drd-do-local-io: reconfigured The following DRD attributes modify the operational behavior of the DRD subsystem. They are for system-testing purposes only and should not be modified. drd-noop-open By default, the DRD driver keeps the device open for the duration of service. It discards all open operations. To force an open call to be passed to the driver, specify drd-noop-open = 1. <tuning not supported> drd-noop-close By default, the DRD driver discards all close operations. To force a close call to be passed to the driver, specify drd-noop- close = 1. <tuning not supported> drd-open-key When the value of this attribute is 1, all open calls are tested to ensure that they have specified the open flag (O_DRD). This flag allows applications to restrict DRD usage. <tuning not supported> drd-broadcast-mc-only The various cluster members use a network broadcast mechanism to determine the DRD map entries. The DRD map entries define which node within the cluster is the block shipping service (BSS) server of a specific disk device. When this attribute is nonzero, the broadcasts are transmitted only on the Memory Channel network interfaces. In this manner, broadcast activity is constrained to the cluster interconnect. <tuning not supported> drd-do-broadcast When this attribute is nonzero, the DRD map specification code broadcasts new map entries on server nodes. In this manner, client nodes are informed of which node is the server of a specified disk. <tuning not supported> drd-accept-mc-maps When this attribute is nonzero, the DRD subsystem accepts all map entries that were broadcast over the Memory Channel subnet from the BSS server nodes. When this attribute is zero (0), only map entries from a list of trusted nodes are accepted. <tuning not supported> drd-accept-all-maps When this attribute is nonzero, the DRD subsystem accepts all map entries that were broadcast over any network interconnect. <tuning not supported> drd-mc-iodone-inline-all When this attribute is nonzero, BSS command completion tasks for both read and write operations are performed in the context of the underlying physical device driver's iodone completion code. The setting of this attribute supersedes that of the drd-mc- iodone-inline-writes attribute. <tuning not supported> drd-mc-iodone-inline-writes When this attribute is nonzero, the BSS command completion tasks for write operations are performed in the context of the underlying physical device driver's iodone completion code. The setting of this attribute is superseded by that of the drd-mc- iodone-inline-all attribute. <tuning not supported> drd-mc-iodone-nthreads Specifies the number of BSS iodone completion threads to run on this member system. An iodone completion thread is a kernel thread within the DRD subsystem that performs BSS completion operations. It runs, based on a trigger in the iodone completion path of the underlying physical device driver. <tuning not supported> drd-bss-rm-peer2peer When this attribute is nonzero, it enables peer-to-peer direct- memory-access (DMA) between the host's SCSI storage controller and the Memory Channel adapter on the same PCI bus. When enabled on a cluster member system, peer-to-peer DMA is used for all remote DRD read requests processed by that member system as a DRD server. Data read in response to the request is sent directly to the Memory Channel adapter from the disk controller, which avoids an intermediate transfer by way of main memory. This attribute is set at boot time by the drd_dma utility, as long as all SCSI controllers and Memory Channel adapters used by the DRD server are on the same PCI bus. If you manually set this attribute using the sysconfig command, you must ensure that this configuration requirement is met and that no DRD disks are currently active. If DRD disks are active at the time you enter the command, the system may panic. The drd-bss-rm-peer2peer attribute is incompatible with the drd_data_compare attribute. A request to enable the drd-bss-rm- peer2peer attribute will fail if data checksumming over the Memory Channel interconnect is enabled within the DRD subsystem. drd-bss-p2p-root-allowed When this attribute is nonzero, it disables the check in the DRD pseudodevice driver that prevents the creation of DRD services using disks on the same SCSI bus on which the disk that holds the root file system resides. Compaq does not support this type of configuration. Setting this attribute may prevent the enabling of peer-to-peer DMA on certain member systems. <tuning not supported> drd-suspend When this attribute is reconfigured to a nonzero value, it suspends all DRD I/O operations. When DRD operations are suspended and this attribute is reconfigured to 0 (zero), DRD I/O operations are resumed. When this attribute is queried, a nonzero value indicates that the DRD subsystem's operations have been suspended. The following DRD attributes are used to debug the DRD subsystem (for example, by forcing error conditions or modifying the frequency of display messages): drd-print-info When this attribute is nonzero, informational debug messages are displayed on the system console. Setting this attribute to a value greater than 1 causes additional low-priority messages to be displayed. drd-print-warn When this attribute is nonzero, warning debug messages are displayed on the system console. drd-mc-print-info When this attribute is nonzero, informational messages for the Memory Channel specific portions of DRD are displayed on the system console. Setting this attribute to a value greater than 1 causes additional low-priority messages to be displayed. drd-mc-print-warn When this attribute is nonzero, informational and warning messages for the Memory Channel specific portions of DRD are displayed on the system console. drd-data-compare When this attribute is set to 1, 2, or 3, the DRD subsystem performs a checksum of the data portion of read and write requests. For proper operation, this attribute must be set to the same value on all cluster members. When this attribute is 0, no data check summing and comparisons are performed. When this attribute is 1, the bsc_stats.bsc_read_miscompares stat counter is incremented on DRD client read miscompares and the bss_stats.bss_write_miscompares stat counter is incremented on DRD server write miscompares. When this attribute is 2, the stat counters are incremented as appropriate and one of the following error messages is written to the console and kernel log files: bsc_do_unmap_RM: READ check sum failure server = # client = # bsc_rm_docopyinout: READ checksum failure server = # client = # bss_rm_server: WRITE checksum failure client = # server = # When this attribute is 3, the stat counters are incremented as appropriate, the pertinent messages are written to the log files, and the system panics. All cluster members must use the same drd-data-compare value. Otherwise, some cluster members will not initialize the checksum value, causing other members to erroneously report that data corruption has occurred. The drd-data-compare attribute is incompatible with the drd-bss- rm-peer2peer attribute. A request to set drd-data-compare will fail if peer-to-peer DMA is enabled. <tuning not supported> drd-disk-drain When this attribute is reconfigured (for example, by a sysconfig -r drd drd-disk-drain=1 command), all pending I/O operations will be completed before the reconfigure call returns. drd-do-bss-hist When this attribute is nonzero, a histogram is recorded on the number of block shipping service daemon (bssd) threads. To print this histogram on a DRD server, start the dbx debugger on a system running the kernel and enter the p bss_sv_active_hist command. Use the -t flag to the drd_ivp utility to verify the setting of the drd-do-bss-hist attribute. drd-fail-local-io When this debug testing attribute is nonzero, all local I/O operations are returned with an [EIO] error status. <tuning not supported> drd-fail-remote-io When this debug testing attribute is nonzero, all remote I/O operations are returned with an [EIO] error status. <tuning not supported> drd-nomap-failover Normally, when a block shipping client (BSC) client node receives errors on the I/O requests, it deletes the map entry after several retries. This is done so that subsequent retries will obtain a new map entry from the server. When this debug attribute is set, map entries are not deleted. <tuning not supported> drd-skip-dlm-chk When this attribute is nonzero, the DRD subsystem will not ensure that appropriate distributed lock manager (DLM) sequence numbers have been specified in I/O requests. <tuning not supported> drd-state Displays the current state of the DRD subsystem state flags. This is a read-only attribute. drd-major-version Specifies the major version number of the DRD subsystem. drd-minor-version Specifies the minor version number of the DRD subsystem. cmajnum Specifies the character driver major number for the DRD device driver. The device special files in the /dev/rdrd directory are created with a matching major number. bmajnum Specifies the block driver major number for the DRD device driver. This attribute is set to -1 when no block interface has been configured. Module_name Contains a string that identifies the DRD subsystem ("Distributed Raw Disk - TruCluster Subsystem"). The following DRD attributes are for performance analysis. Typically, they cause I/O requests to return immediately, which avoids various calling sequences. drd-loopback When this attribute is nonzero, the block shipping service/block shipping client (BSS/BSC) code path is exercised locally via IP loopback to debug on a single node. <tuning not supported> drd-do-local-io When this attribute is set to 1, all local disk requests are passed to the underlying driver as expected. When this attribute is set to zero (0) for performance analysis, the read and write requests are not sent to the underlying driver. Rather, they are returned immediately with a success status without performing the actual I/O transfer. Setting this attribute to zero (0) does not disable I/O requests received by a BSS; to do that, use the drd- do-remote-io attribute. <tuning not supported> drd-do-remote-io Similar to the drd-do-local-io attribute, except that this attribute applies all remote requests to the BSS. When this attribute is set to zero (0), all client requests go out over the wire, but, at the server side, the disk driver is not called and a successful response is immediately returned. <tuning not supported> drd-usedrc Tells the bssd daemon that it should use a duplicate request cache (DRC). A DRC protects against the harmful replay of Remote Procedure Call (RPC) requests in the unlikely event of network- related problems. Using the DRC adds overhead to each normally successful request; the bssd daemon processing the request searches the DRC for a match before starting each operation, and then copies the results. This attribute cannot be reconfigured after the system is up and running. <tuning not supported> drd-do-clnt-call Used for performance analysis of the code path. When it is zero (0), RPC calls are not sent remotely; they immediately return a success status. <tuning not supported> drd-requeue Controls the usage of a performance optimization in the BSS server by governing how RPC requests are received. When set to a nonzero value, this attribute causes the bssd daemons to be awakened out of the netisr threads to process incoming requests. When this attribute is zero (0), the bssd daemons are directly dispatched, which avoids a context switch. This attribute is not set by default because the DRD subsystem does not handle read and write requests by RPC over the Memory Channel interconnect. <tuning not supported> drd-disable-mc An optimized code path is used for DRD reads and writes. This code path uses the Memory Channel interconnect, and bypasses the UDP/IP code path used for conventional RPC requests. When this attribute is set, the optimized code path is not used, and DRD reads and writes are performed directly over the UDP/IP via RPC calls. <tuning not supported> drd-subsys-stat, drd-disk-stat, drd-que-stat drd-remote-disk-stat, drd-io-size-stats drd-map-query Pass DRD subsystem performance statistics to performance monitoring utilities. They contain binary data structures, which are not displayed by conventional sysconfig commands. <tuning not supported> drd-mc-maxphys Specifies the size of the largest single I/O transfer that is supported when using the Memory Channel interconnect. I/O requests that are larger than this value are fragmented. The value of this attribute is determined by the corresponding device driver maxphys interface of all supported underlying device driver types. <tuning not supported> drd-mc-iodone-nthreads-run Shows the number of bssd iodone completion threads that are running on the member system. This attribute can only be queried. To change the number of bssd iodone completion threads on a member, use the drd-mc-iodone-nthreads attribute. drd-bss-rm-iodone-bind Causes each bssd iodone completion thread to be bound to a nonprimary CPU. When this attribute is nonzero, no iodone thread will be run on the primary CPU. When this attribute is zero (0), iodone threads may be scheduled on any available CPU. <tuning not supported> drd-io-size-enable When this attribute is nonzero, the DRD kernel subsystem maintains counters that describe the size of read and write operations. By default, this parameter is set to zero (0) to avoid the overhead of maintaining these counters. The following DRD attributes are for tuning purposes. They may change the number of retries or timeout intervals. drd-bssd-busy Specifies the percentage of time when all of the bssd daemons are busy concurrently serving RPC requests. A value exceeding 10% indicates that more bssd threads should be run. Use the -t flag of the drd_ivp utility to verify the value of this attribute. drd-maphash-size Specifies the number of chains in the DRD subsystem's member/disk map entry hash table. This value must be a power of 2 in the range of 8 to 1024. Use the -t flag of the drd_ivp utility to verify the value of this attribute. The DRD subsystem describes which cluster member is serving each disk in a set of map entries that are collected in a number of chains in a hash table. drd-max-hash-length Specifies the length of the largest chain in the DRD subsystem's member/disk map entry hash table. When this value exceeds 50 entries, increase the number of hash chains using the drd- maphash-size attribute. Use the -t flag of the drd_ivp utility to verify the value of this attribute. drd-retry-seconds Specifies the number of seconds that the DRD subsystem will wait while trying to process a remote request before returning an "No such device" error to the calling application. The value of this attribute dictates how long the DRD subsystem waits to determine which member is serving a requested disk before timing out. The default value of this attribute results in a virtually infinite waiting period and is sufficient for most applications. The drd-retry-seconds attribute should be set to a lower value for a configuration in which applications must be aware of a disk or service failure. drd-broadcast-query-retries drd-broadcast-query-nsleeps Specify the number of times the DRD subsystem will retry a remote request and the time interval between retries. The drd-retry- seconds attribute is the product of these two attributes. <tuning not supported> drd-base-timeo Specifies the command timeout interval (in 1/10-second increments) at the RPC level. To determine how much time is spent retrying at the RPC level, multiply this attribute by the value of the drd-rpc-retries attribute. <tuning not supported> drd-mc-rd-segsize Adjusts the size of each segment created to support read operations over the Memory Channel interconnect. The DRD subsystem allocates read and write segments in response to the first remote I/O operation, and never allocates a segment for an inactive cluster member. It creates a segment for each type of operation (read and write) and for each client/server pair in the cluster. Specify this value as a number of bytes. If the number does not represent an even multiple of 512 KB, it is rounded up to the next multiple. Note that read segments are larger than write segments to take advantage of the Memory Channel hardware's ability to write directly to process space pages. Consequently, if your application issues many I/O requests involving data smaller than 8192 bytes and buffers that are not aligned on 8192-byte boundaries, you may need to allocate up to four times more space for read segments than for write segments. The DRD statistics monitoring utilities display the percentage of remote Memory Channel read and write operations that are stalled waiting for sufficient segment space to issue the request. If these values exceed 10%, increase the segment by adjusting the drd-mc-rd-segsize and drd-mc-wr-segsize attributes. After modifying these attributes, you must reboot the system for them to take effect. Use the -t flag to the drd_ivp utility to verify the setting of the drd-mc-rd-segsize and drd-mc-wr-segsize attributes. drd-mc-wr-segsize Adjusts the size of each segment created to support write operations over the Memory Channel interconnect. The DRD subsystem allocates read and write segments in response to the first remote I/O operation, and never allocates a segment for an inactive cluster member. It creates a segment for each type of operation (read and write) and for each client/server pair in the cluster. Specify this value as a number of bytes. If the number does not represent an even multiple of 512 KB, it is rounded up to the next multiple. The DRD statistics-monitoring utilities display the percentage of remote Memory Channel read and write operations that are stalled waiting for enough segment space to issue the request. If these values exceed 10%, increase the segment by adjusting the drd-mc- rd-segsize and drd-mc-wr-segsize attributes. After modifying these attributes, you must reboot the system for them to take effect. Use the -t flag to the drd_ivp utility to verify the setting of the drd-mc-rd-segsize and drd-mc-wr-segsize attributes. drd-rpc-retries Specifies the number of times failed commands are retried at the RPC level. <tuning not supported> drd-bp-low-water Controls the behavior of a private memory pool of buf structures used by the DRD subsystem. drd-bp-high-water Controls the behavior of a private memory pool of buf structures used by the DRD subsystem. drd-bp-increment Controls the behavior of a private memory pool of buf structures used by the DRD subsystem. drd-bssd-max Specifies the maximum number of bssd threads. The bssd command specifies an attribute indicating how many threads to spawn. That value cannot exceed the value of the drd-bssd-max attribute. drd-bsc-biod-max Specifies the maximum number of bsc_biod threads. The bsc_biod command specifies an attribute indicating how many threads to spawn. That value cannot exceed the value of the drd-bsc-biod- max attribute. The following DRD attributes are used to configure the DRD subsystem. They can be neither queried nor tuned. · Device_Block_Major · Device_Char_Files · Device_Char_Major · Device_Char_Minor · Device_Dir · Device_Mode · Module_Config1 · Module_Config_Name · Module_Type

RESTRICTIONS

The supported set of generic disk ioctls include the following: · DEVIOCGET · DEVGETGEOM · DEVGETINFO · DIOCGDINFO · DIOCSDINFO · DIOCWDINFO · DIOCWLABEL · DIOCGDEFPT · DIOCGCURPT All disk-related maintenance ioctls (as used in the disklabel and scu commands) must be issued to the underlying physical device driver. For example, the scu command specifies /dev/rrz2c and not /dev/rdrd/drd4. No bdevsw (block device) interfaces are provided. Consequently you cannot mount a file system on a drd device. The drd driver does not provide exclusive open semantics. This restriction implies that you cannot use drd to remotely serve a tape device. Tape device support is further precluded by the fact that drd may fragment read and write requests based on the underlying network transport. When a device is being served by drd, the subsystem will verify that it is a disk type device.

FILE

/dev/rdrd/drdx A DRD device special file (where x is a number assigned at the time the asemgr utility configures the DRD service). The device special file /dev/rdrd/drd0 is reserved for DRD control functions.

RELATED INFORMATION

devio(7), asemgr(8), bsc_biod(8), bssd(8), drd_balance(8) drd_dma(8), drd_ivp(8), drd_mknod(8)

Index Index for
Section 7
Index Alphabetical
listing for D
Index Top of
page