9    Distributed Lock Manager

This chapter describes how to use the distributed lock manager (DLM) to synchronize access to shared resources in a cluster. It contains the following discussions:

Section 9.7 provides a code example showing the basic DLM operations.

9.1    Overview

The distributed lock manager (DLM) provides functions that allow cooperating processes in a cluster to synchronize access to a shared resource, such as a raw disk device, a file, or a program. For the DLM to effectively synchronize access to a shared resource, all processes in the cluster that share the resource must use DLM functions to control access to the resource.

DLM functions allow callers to:

Table 9-1 lists the functions the DLM provides. These functions are available in the libdlm library for use by applications.

Table 9-1:  Distributed Lock Manager Functions

Function Description
dlm_cancel Cancels a lock conversion request
dlm_cvt Synchronously converts an existing lock to a new mode
dlm_detach Detaches a process from all namespaces
dlm_get_lkinfo Obtains information about a lock request associated with a given process
dlm_get_rsbinfo Obtains locking information about resources managed by the DLM
dlm_glc_attach Attaches to an existing process lock group
dlm_glc_create Creates a group lock container
dlm_glc_destroy Destroys a group lock container
dlm_glc_detach Detaches from a process lock group
dlm_lock Synchronously requests a lock on a named resource
dlm_locktp Synchronously requests a lock on a named resource, using group locks and/or transaction IDs
dlm_notify Polls for outstanding completion and blocking notifications
dlm_nsjoin Joins the specified namespace
dlm_nsleave Leaves the specified namespace
dlm_perrno Prints the message text associated with a given DLM message ID
dlm_perror Prints the message text associated with a given DLM message ID, plus a caller-specified message string
dlm_quecvt Asynchronously converts an existing lock to a new mode
dlm_quelock Asynchronously requests a lock on a named resource
dlm_quelocktp Asynchronously requests a lock on a named resource, using group locks and/or transaction IDs
dlm_rd_attach Attaches a process or process lock group to a recovery domain
dlm_rd_collect Initiates the recovery procedure for a specified recovery domain by collecting those locks on resources in the domain that have invalid lock value blocks
dlm_rd_detach Detaches a process or process lock group from a recovery domain
dlm_rd_validate Completes the recovery procedure for a specified recovery domain by validating the resources in the specified recovery domain collection
dlm_set_signal Specifies the signal to be used for completion and blocking notifications
dlm_sperrno Obtains the character string associated with a given DLM message ID and stores it in a variable
dlm_unlock Releases a lock

The DLM itself does not ensure proper access to a resource. Rather, the processes accessing a resource agree to access the resource cooperatively, use DLM functions when doing so, and respect the rules for using the lock manager. These rules are as follows:

Because locks are owned by processes, applications that use the DLM must take into account the following points:

9.2    Resources

A resource can be any entity in a cluster (for example, a file, a data structure, a raw disk device, a database, or an executable program). When two or more processes access the same resource concurrently, they must often synchronize their access to the resource to obtain correct results.

The lock management functions allow processes to associate a name or binary data with a resource and synchronize access to that resource. Without synchronization, if one process is reading the resource while another is writing new data, the writer can quickly invalidate anything being read by the reader.

From the viewpoint of the DLM, a resource is created when a process (or a process on behalf of a DLM process group) first requests a lock on the resource's name. At that point, the DLM creates the structure that contains, among other things, the resource's lock queues and its lock value block. As long as at least one process owns a lock on the resource, the resource continues to exist. Once the last lock on the resource is dequeued, the DLM can delete the resource. Normally, a lock is dequeued by a call to the dlm_unlock function, but a lock (and potentially a resource as well) can be freed abnormally if the process exits unexpectedly.

9.2.1    Resource Granularity

Many resources can be divided into smaller parts. As long as a part of a resource can be identified by a resource name, the part can be locked.

Figure 9-1 shows a model of a database. The database is divided into volumes, which in turn are subdivided into files. Files are further divided into records, and the records are further divided into items.

The processes that request locks on the database shown in Figure 9-1 can lock the whole database, a volume in the database, a file, a record, or a single item. Locking the entire database is considered locking at a coarse granularity; locking a single item is considered locking at a fine granularity.

Parent locks and sublocks are the mechanism by which the DLM allows processes to achieve locking at various degrees of granularity. See Section 9.5.5 for more information about parent locks and sublocks.

Figure 9-1:  Model Database

9.2.2    Namespaces

A namespace can be viewed as a container for resource names. Multiple namespaces exist to provide separation of unrelated applications for reasons of security and modularity.

A namespace is qualified by effective user ID or effective group ID.

Access to a namespace based on a user ID is limited to holders of that user ID. Access to a namespace based on a group ID is limited to members of that group.

Security is based by determining a process's right to access the namespace, as evidenced by its holding the effective user ID or effective group ID. As a result, the user and group ID namespaces must be consistent across the cluster. After access to the namespace has been granted to a process, its individual locking operations within that namespace are unrestricted.

Cooperating processes must use the same namespace to coordinate locks for a given resource. A process must join a namespace before attempting to call the lm_nsjoin function to acquire a lock on a resource in that namespace. When the process calls the dlm_nsjoin function, the DLM verifies that it is permitted to access a namespace by verifying that the process holds the group or user ID appropriate to that namespace. If the process passes this check, the DLM returns a handle to the namespace. The process must present this handle on subsequent calls to DLM functions to acquire root locks (that is, the base parent lock for a given resource in a namespace). You can add sublocks under root locks without further namespace access checks.

A process can be a member of up to DLM_NSPROCMAX namespaces.

9.2.3    Uniquely Identifying Resources

The DLM distinguishes resources by using the following attributes:

For example, the following two sets of attributes identify the same resource:

Attribute nsp resnam resnlen
Resource 1 14 disk1 5
Resource 1 14 disk1 5

The following two sets of attributes also identify the same resource:

Attribute nsp resnam resnlen
Resource 1 14 disk1 5
Resource 1 14 disk12345 5

The following two sets of attributes identify different resources:

Attribute nsp resnam resnlen parid
Resource 1 0 disk1 5 80
Resource 2 0 disk1 5 40

9.3    Using Locks

To use distributed lock manager (DLM) functions, a process must request access to a resource (request a lock) using the dlm_lock, dlm_locktp, dlm_quelock, or dlm_quelocktp function. The request specifies the following parameters:

Null mode locks (see Section 9.3.1) are compatible with all other lock modes and are always granted immediately.

New locks are granted immediately in the following instances:

New locks are not granted in the following instance:

Processes can also use the dlm_cvt and dlm_quecvt functions to change the lock mode of a lock. This is called a lock conversion. See Section 9.3.4 for additional information.

9.3.1    Lock Modes

The mode of a lock determines whether or not the resource can be shared with other lock requests. Table 9-2 describes the six lock modes.

Table 9-2:  Lock Modes

Mode Description
Null (DLM_NLMODE) Grants no access to the resource; the Null mode is used as a placeholder for future lock conversions or as a means of preserving a resource and its context when no other locks on it exist.
Concurrent Read (DLM_CRMODE) Grants read access to the resource and allows it to be shared with other readers. The concurrent read mode is generally used when additional locking is being performed at a finer granularity with sublocks, or to read data from a resource in an unprotected fashion (allowing multaneous writes to the resource).
Concurrent Write (DLM_CWMODE) Grants write access to the resource and allows it to be shared with other writers. The concurrent write mode is typically used to perform additional locking at a finer granularity, or to write in an unprotected fashion.
Protected Read (DLM_PRMODE) Grants read access to the resource and allows it to be shared with other readers. No writers are allowed access to the resource. This is the traditional share lock.
Protected Write (DLM_PWMODE) Grants write access to the resource and allows it to be shared with concurrent read mode readers. No other writers are allowed access to the resource. This is the traditional update lock.
Exclusive (DLM_EXMODE) Grants write access to the resource and prevents it from being shared with any other readers or writers. This is the traditional Exclusive lock.

9.3.2    Levels of Locking and Compatibility

Locks that allow the process to share a resource are called low-level locks; locks that allow the process almost exclusive access to a resource are called high-level locks. Null and Concurrent Read mode locks are considered low-level locks; Protected Write and Exclusive mode locks are considered high-level locks. The lock modes from lowest to highest level access modes are as follows:

  1. Null (NL)

  2. Concurrent Read (CR)

  3. Concurrent Write (CW) and Protected Read (PR)

  4. Protected Write (PW)

  5. Exclusive (EX)

The Concurrent Write (CW) and Protected Read (PR) modes are considered to be of equal level.

Locks that can be shared with other granted locks on a resource (that is, the resource's group grant mode) are said to have compatible lock modes. Higher-level lock modes are less compatible with other lock modes than are lower-level lock modes.

Table 9-3 shows the compatibility of the lock modes.

Table 9-3:  Compatibility of Lock Modes

Mode of Requested Lock Resource Group Grant Mode
  NL CR CW PR PW EX
Null (NL) Yes Yes Yes Yes Yes Yes
Concurrent Read (CR) Yes Yes Yes Yes Yes No
Concurrent Write (CW) Yes Yes Yes No No No
Protected Read (PR) Yes Yes No Yes No No
Protected Write (PW) Yes Yes No No No No
Exclusive (EX) Yes No No No No No

9.3.3    Lock Management Queues

A lock on a resource can be in one of the following three states:

A queue is associated with each of the three states, as shown in Figure 9-2.

Figure 9-2:  Three Lock Queues

When you request a new lock on an existing resource, the DLM determines if any other locks are waiting in either the conversion or waiting queue, as follows:

9.3.4    Lock Conversions

Lock conversions allow processes to change the mode of locks. For example, a process can maintain a low-level lock on a resource until it decides to limit access to the resource by requesting a lock conversion.

You specify lock conversions by using either the dlm_cvt or the dlm_quecvt function with the lock ID of a previously granted lock that you wish to convert. If the requested lock mode is compatible with the currently granted locks, the conversion request is granted immediately. If the requested lock mode is incompatible with the existing locks in the granted queue, the request is placed at the end of the conversion queue. The lock retains its granted mode until the conversion request is granted.

After the DLM grants the conversion request, it grants any compatible requests immediately following it on the conversion queue. The DLM continues to grant requests until the conversion queue is empty or it encounters an incompatible lock.

When the conversion queue is empty, the DLM checks the waiting queue. It grants the first lock request on the waiting queue if it is compatible with the locks currently granted. The DLM continues to grant requests until the waiting queue is empty or it encounters an incompatible lock.

9.3.5    Deadlock Detection

The DLM can detect two forms of deadlock:

Figure 9-4:  Multiple Resource Deadlock

If the DLM determines that either a conversion deadlock or a multiple resource deadlock exists, it chooses a lock to use as a victim to break the deadlock. Although the victim is arbitrarily selected, it is guaranteed to be either on the conversion or waiting queue (that is, it is not in the granted queue). The DLM returns a DLM_DEADLOCK final completion status code to the process that issued this dlm_lock, dlm_locktp, or dlm_cvt function call (or provides this status in the completion_status parameter to the completion routine specified in the call to the dlm_quelock, dlm_quelocktp, or dlm_quecvt function). Granted locks are never revoked; only converting and waiting lock requests can receive the DLM_DEADLOCK status code.

Note

You must not make assumptions about which lock the DLM will choose to break a deadlock. Also, it is possible to have undetectable deadlocks when other services such as semaphores or file locks are used in conjunction with the DLM. The DLM detects only those deadlocks involving its own locks.

9.4    Dequeuing Locks

When a process no longer needs a lock on a resource, it can release the lock by calling the dlm_unlock function.

When a lock is released, the specified lock request is removed from whatever queue it is in. Locks are dequeued from any queue: granted, waiting, or conversion. When the last lock on a resource is dequeued, the resource is deleted from the distributed lock manager (DLM) database.

The dlm_unlock function can write or invalidate the resource's lock value block if it specifies the valb parameter and the DLM_VALB flag. If the lock to be dequeued has a granted mode of PW or EX, the contents of the process's value block are stored in the resource value block. If the lock being dequeued is in any other mode, the lock value block is not used. If the DLM_INVVALBLK flag is specified, the resource's lock value block is marked invalid.

The dlm_unlock function uses the following flags:

You cannot specify both the DLM_VALB and DLM_INVVALBLK flags in the same request.

9.4.1    Canceling a Conversion Request

The dlm_cancel function cancels a lock conversion. A process can cancel a lock conversion only if the lock request has not yet been granted, in which case the request is in the conversion queue. Cancellation causes a lock in the conversion queue to revert to the granted lock mode it had before the conversion request. The blkrtn and notprm values of the lock also revert to the old values. The DLM calls any completion routine specified in the conversion request to indicate that the request has been canceled. The returned status is DLM_CANCELLED.

9.5    Advanced Locking Techniques

The previous sections discussed locking techniques and concepts useful to all applications. The following sections discuss specialized features of the distributed lock manager (DLM).

9.5.1    Asynchronous Completion of a Lock Request

The dlm_lock, dlm_locktp, and dlm_cvt functions complete when the lock request has been granted or has failed, as indicated by the return status value.

If an application does not want to wait for completion of the lock request, it should use the dlm_quelock, dlm_quelocktp, and dlm_quecvt functions. These functions return control to the calling program after the lock request is queued. The status value returned by these functions indicates whether the request was queued successfully or was rejected. After a request is queued, the calling program cannot access the resource until the request is granted.

Calls to the dlm_quelock, dlm_quelocktp, and dlm_quecvt functions must specify the address of a completion routine. The completion routine runs when the lock request is successful or unsuccessful. The DLM passes to the completion routines status information that indicates the success or failure of the lock request.

Note

If an application wants the DLM to deliver completion notifications, it must call the dlm_set_signal function once before making the first lock request requiring one. Alternatively, the application can periodically call the dlm_notify function. The dlm_notify function enables a process to poll for pending notifications and request their delivery, without needing to call the dlm_set_signal function. The polling method is not recommended.

9.5.2    Notification of Synchronous Completion

The DLM provides a mechanism that allows processes to determine if a lock request is granted synchronously; that is, if the lock is not placed on the conversion or waiting queue. By avoiding the overhead of signal delivery and the resulting execution of a completion routine, an application can use this feature to improve performance in situations where most locks are granted synchronously (as is normally the case). An application can also use this feature to test for the absence of a conflicting lock when the request is processed.

This feature works as follows:

9.5.3    Blocking Notifications

In some applications that use the DLM functions, a process must know if it is preventing another process from locking a resource. The DLM informs processes of this by using blocking notifications. To enable blocking notifications, the blkrtn parameter of the lock request must contain the address of a blocking notification routine. When the lock prevents another lock from being granted, a blocking notification is delivered and the blocking notification routine is exeuted.

The DLM provides the blocking notification routine with the following parameters:

notprm

Context parameter of the blocking lock. This parameter was supplied by the caller of the dlm_lock, dlm_locktp, dlm_quelock, dlm_quelocktp, dlm_cvt, or dlm_quecvt function in the lock request for the blocking lock.

blocked_hint

The hint parameter from the first blocked lock. This parameter was supplied by the caller of the dlm_lock, dlm_locktp, dlm_quelock, dlm_quelocktp, dlm_cvt, or dlm_quecvt function in the lock request for the first blocked lock.

lkid

Pointer to the lock ID of the blocking lock.

blocked_mode

Requested mode of the first blocked lock.

By the time the notification is delivered the following conditions could still exist:

Because these conditions are possible, the DLM can make no guarantees about the validity of the blocked_hint and blocked_mode parameters at the time the blocking routine is executed.

Note

If an application wants the DLM to deliver blocking notifications, it must call the dlm_set_signal function once before making the first lock request requiring a blocking notification.

Note also that if the signal specified in the dlm_set_signal call is blocked, the blocking notification will not be delivered until the signal is unblocked. Alternatively, the application can periodically call the dlm_notify function. The dlm_notify function enables a process to poll for pending notifications and request their delivery. The polling method is not recommended.

9.5.4    Lock Conversions

Lock conversions perform the following functions:

9.5.4.1    Queuing Lock Conversions

To perform a lock conversion, a procedure calls the dlm_cvt or dlm_quecvt function. The lock being converted is identified by the lkid_p parameter. A lock must be granted before it can be the object of a conversion request.

9.5.4.2    Forced Queuing of Conversions

To promote more equitable access to a given resource, you can force certain conversion requests to be queued that would otherwise be granted. A conversion request with the DLM_QUECVT flag set is forced to wait behind any already queued conversions. In this manner, you can specify the DLM_QUECVT flag to give other locks a chance of being granted. However, the conversion request is granted immediately if there are no conversions already queued.

The DLM_QUECVT behavior is valid only for a subset of all possible conversions. Table 9-5 defines the set of conversion requests that are permitted when you specify the DLM_QUECVT flag. Illegal conversion requests fail with a return status of DLM_BADPARAM.

Table 9-5:  Conversions Allowed when the DLM_QUECVT Flag is Specified

Mode at Which Lock is Held Mode to Which Lock is Converted
  NL CR CW PR PW EX
Null (NL) --- --- --- --- --- ---
Concurrent Read (CR) --- --- Legal Legal Legal Legal
Concurrent Write (CW) --- --- --- Legal Legal Legal
Protected Read (PR) --- --- Legal --- Legal Legal
Protected Write (PW) --- --- --- --- --- ---
Exclusive (EX) --- --- --- --- --- ---

9.5.5    Parent Locks

When a process scalls the dlm_lock, dlm_locktp, dlm_quelock, or dlm_quelocktp function to issue a lock request, it can declare a parent lock for the new lock by specifying the parent ID in the parid parameter. Locks with parents are called sublocks. A parent lock must be granted before the sublocks belonging to the parent can be granted in the same or some other mode.

The benefit of using parent locks and sublocks is that they allow low-level locks (Concurrent Read or Concurrent Write) to be held at a coarse granularity, while higher-level (Protected Write or Exclusive mode) sublocks are held on resources of a finer granularity. For example, a low-level lock might control access to an entire file, while higher-level sublocks protect individual records or data items in the file.

Assume that a number of processes need to access a database. The database can be locked at two levels: the file and individual records. When updating all the records in a file, locking the whole file and updating the records without additional locking is faster and more efficient. But, when updating selected records, locking each record as it is needed is preferable.

To use parent locks in this way, all processes request locks on the file. Processes that need to update all records must request Protected Write (PW) or Exclusive (EX) mode locks on the file. Processes that need to update individual records request Concurrent Write (CW) mode locks on the file, and then use sublocks to lock the individual records in PW or EX mode.

In this way, the processes that need to access all records can do so by locking the file, while processes that share the file can lock individual records. A number of processes can share the file-level lock at concurrent write mode, while their sublocks update selected records.

9.5.6    Lock Value Blocks

The lock value block is a structure of DLM_VALBLKSIZE unsigned longwords in size that a process associates with a resource by specifying the valb parameter and the DLM_VALB option in calls to DLM functions. When the lock manager creates a resource, it also creates a lock value block for that resource. The DLM maintains the resource lock value block until there are no more locks on the resource.

When a process specifies the DLM_VALB option and a valid address in the valb parameter in a new lock request and the request is granted, the contents of the resource lock value block are copied to the process's lock value block from the resource lock value block.

When a process specifies the valb parameter and the DLM_VALB option in a conversion from PW mode or EX mode to the same or a lower mode, the contents of the process's lock value block are stored in the resource lock value block.

In this manner, processes can pass (and update) the value in the lock value block along with the ownership of a resource. Table 9-6 shows how lock conversions affect the contents of the process's and the resource's lock value block.

Table 9-6:  Effect of Lock Conversion on Lock Value Block

Mode at Which Lock Is Held Mode to Which Lock Is Converted
  NL CR CW PR PW EX
Null (NL) Read Read Read Read Read Read
Concurrent Read (CR) --- Read Read Read Read Read
Concurrent Write (CW) --- --- Read Read Read Read
Protected Read (PR) --- --- --- Read Read Read
Protected Write (PW) Write Write Write Write Write Read
Exclusive (EX) Write Write Write Write Write Write

Note that when granted PW or EX mode locks are released using the dlm_unlock function, the address of a lock value block is specified in the valb parameter, and the DLM_VALB option is specified, the contents of the process's lock value block are written to the resource lock value block. If the lock being released is in any other mode, the lock value block is not used.

In some situations, the resource lock value block can become invalid. When this occurs, the DLM warns the caller of a function specifying the valb parameter by returning the completion status of DLM_VALNOTVALID. The following events can invalidate the resource lock value block:

9.6    Local Buffer Caching Using DLM Functions

Applications can use the distributed lock manager (DLM) to perform local buffer caching (also called distributed buffer management). Local buffer caching allows a number of processes to maintain copies of data (for example, disk blocks) in buffers local to each process, and to be notified when the buffers contain invalid data due to modifications by another process. In applications where modifications are infrequent, you may save substantial I/O by maintaining local copies of buffers -- hence, the names local buffer caching or distributed buffer management. Either the lock value block or blocking notifications (or both) can be used to perform buffer caching.

9.6.1    Using the Lock Value Block

To support local buffer caching using the lock value block, each process maintaining a cache of buffers maintains a Null (NL) mode lock on a resource that represents the current contents of each buffer. (For this discussion, assume that the buffers contain disk blocks.) The lock value block associated with each resource is used to contain a disk block version number. The first time a lock is obtained on a particular disk block, the application returns the current version number of that disk block in the lock value block of the process.

If the contents of the buffer are cached, this version number is saved along with the buffer. To reuse the contents of the buffer, the NL mode lock must be converted to Protected Read (PR) mode or Exclusive (EX) mode, depending on whether the buffer is to be read or written. This conversion returns the latest version number of the disk block. The application compares the version number of the disk block with the saved version number. If they are equal, the cached copy is valid. If they are not equal, the application must read a fresh copy of the disk block from disk.

Whenever a procedure modifies a buffer, it writes the modified buffer to disk and then increments the version number before converting the corresponding lock to NL mode. In this way, the next process that attempts to use its local copy of the same buffer will find a version number mismatch and must read the latest copy from disk, rather than use its cached (now invalid) buffer.

9.6.2    Using Blocking Notifications

Blocking notifications are used to notify processes with granted locks that another process with an incompatible lock mode has been queued to access the same resource.

You may use blocking notifications to support local buffer caching in two ways. One technique involves deferred buffer writes; the other technique is an alternate method of local buffer caching without using lock value blocks.

9.6.2.1    Deferring Buffer Writes

When local buffer caching is being performed, a modified buffer must be written to disk before the EX mode lock can be released. If a large number of modifications are expected (particularly over a short period of time), you can reduce disk I/O by maintaining the EX mode lock for the entire time that the modifications are being made, and writing the buffer once.

However, this prevents other processes from using the same disk block during this interval. This can be avoided if the process holding EX mode lock has a blocking notification. The notification will notify the process if another process needs to use the same disk block. The holder of the EX mode lock can then write the buffer to disk and convert its lock to NL mode (which allows the other process to access the disk block). However, if no other process needs the same disk block, the first process can modify it many times, but write it only once.

Note

After a blocking notification is delivered to a process, the process must convert the lock to receive any subsequent blocking notifications.

9.6.2.2    Buffer Caching

To perform local buffer caching using blocking notifications, processes do not convert their locks to NL mode from PR or EX mode when finished with the buffer. Instead, they receive blocking notifications whenever another process attempts to lock the same resource in an incompatible lock mode. With this technique, processes are notified that their cached buffers are invalid as soon as a writer needs the buffer, rather than the next time the process tries to use the buffer.

9.6.3    Choosing a Buffer Caching Technique

The choice between using version numbers or blocking notifications to perform local buffer caching depends on the characteristics of the application. An application that uses version numbers performs more lock conversions, while one that uses blocking notifications delivers more notifications. Note that these techniques are compatible; some processes can use one technique at the same time that other processes use the other. Generally speaking, blocking notifications are preferred in a low-contention environment, while version numbers are preferred in a high-contention environment. You may even invent combined or adaptive strategies.

In a combined strategy, the applications use specific techniques. If a process is expected to reuse the contents of a buffer in a short amount of time, blocking notifications are used; if there is no reason to expect a quick reuse, version numbers are used.

In an adaptive strategy, an application makes evaluations on the rate of blocking notifications and conversions. If blocking notifications arrive frequently, the application changes to using version numbers; if many conversions take place and the same cached copy remains valid, the application changes to using blocking notifications.

For example, consider the case where one process continually displays the state of a database, while another occasionally updates it. If version numbers are used, the displaying process must always check to see that its copy of the database is valid (by performing a lock conversion); if blocking notifications are used, the displaying process is informed every time the database is updated. However, if updates occur frequently, using version numbers is preferable to continually delivering blocking notifications.

9.7    Distributed Lock Manager Functions Code Example

The following programs show the basic mechanisms an application uses to join a namespace and establish an initial lock on a resource in that namespace. They also demonstrate such key distributed lock manager (DLM) concepts such as lock conversion, the use of lock value blocks, and the use of blocking notification routines.

The api_ex_master.c and api_ex_client.c programs, listed in Example 9-1 and available from the /usr/examples/cluster directory, can execute in parallel on the same cluster member or on different cluster members. You must run both programs from accounts with the same user ID (UID) and you must start the api_ex_master.c program first. They display output similar to the following:

% api_ex_master
&
api_ex_master: grab a EX mode lock
api_ex_master: value block read
api_ex_master: expected empty value block got <>
api_ex_master: start client and wait for the blocking notification to
               continue
% api_ex_client
&
        api_ex_client: grab a NL mode lock
        api_ex_client: value block read
        api_ex_client: expected empty value block got <>
        api_ex_client: converting to NL->EX to get the value block.
        api_ex_client: should see blocking routine run on master
*** api_ex_master: blk_and_go hold the lock for a couple of seconds
*** api_ex_master: blk_and_go sleeping
*** api_ex_master: blk_and_go sleeping
 
*** api_ex_master: blk_and_go setting done
api_ex_master: now convert (EX-->EX) to write the value block
<abc>
*** api_ex_master: blkrtn: down convert to NL
api_ex_master: waiting for blocking notification
        api_ex_client: value block read
api_ex_master: trying to get the lock back as PR to read value block
        api_ex_client: expected <abc> got <abc>
        *** api_ex_client: blkrtn: dequeue EX lock to write value block
        <>
        *** api_ex_client: hold the lock for a couple of seconds
        *** api_ex_client: sleeping
        *** api_ex_client: sleeping
        *** api_ex_client: sleeping
        api_ex_client: sleeping waiting for blocking notification
api_ex_master: value block read
        api_ex_client: done
api_ex_master: expected <efg> got <efg>
api_ex_master done

Example 9-1:  Locking, Lock Value Blocks, and Lock Conversion

/*************************************************************************
 *                                                                       *
 *                       api_ex_master.c                                 *
 *                                                                       *
 *************************************************************************/
 
 
/* cc -g -o api_ex_master api_ex_master.c -ldlm */
 
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <signal.h>
 
#include <sys/dlm.h>
 
char *resnam = "dist shared resource";
char *prog;
int done = 0;
 
#ifdef DLM_DEBUG
int dlm_debug = 2;
#define Cprintf if(dlm_debug)printf
#define Dprintf if(dlm_debug >= 2 )printf
#else /* DLM_DEBUG */
#define Cprintf ;
#define Dprintf ;
#endif /* DLM_DEBUG */
 
void
error(dlm_lkid_t *lk, dlm_status_t stat)
{
        printf("%s: lock error %s on lkid 0x%lx\n",
         	prog, dlm_sperrno(stat), lk);
        abort();
}
void
blk_and_go(callback_arg_t x, callback_arg_t y, dlm_lkid_t *lk,
		      dlm_lkmode_t blkmode)
{
        int i;
 
        printf("*** %s: blk_and_go hold the lock for a couple of
                seconds\n", prog);
        for (i = 0; i < 3; i++) {
	        printf("*** %s: blk_and_go sleeping\n", prog);
	        sleep(1);
        }
        printf("*** %s: blk_and_go setting done\n", prog);
        /* done waiting */
        done = 1;  [13]
}
void
blkrtn(callback_arg_t x, callback_arg_t y, dlm_lkid_t *lk,
		      dlm_lkmode_t blkmode)
{
        dlm_status_t stat;
 
        Cprintf("*** %s: blkrtn: x 0x%lx y 0x%lx lkid 0x%lx blkmode
                 %d\n", prog, x, y, *lk, blkmode);
        printf("*** %s: blkrtn: down convert to NL\n", prog);
        if ((stat = dlm_cvt(lk, DLM_NLMODE, 0, 0, 0, 0, 0, 0))
					      != DLM_SUCCESS)
         	error(lk, stat);  [16]
        /* let waiters know we're done */
        done = 1;
}
main(int argc, char *argv[])
{
        int resnlen, i;
        dlm_lkid_t lkid;
        dlm_status_t stat;
        dlm_valb_t vb;
        dlm_nsp_t nsp;
 
        /* this program must be run first */
 
        /* first we need to join a namespace */
        if ((stat = dlm_nsjoin(getuid(), &nsp, DLM_USER))
         		      != DLM_SUCCESS) {  [1]
        	printf("%s: can't join namespace\n", argv[0]);
        	error(0, stat);
        }
 
        prog = argv[0];
 
        /* now let DLM know what signal to use for blocking routines */
        dlm_set_signal(SIGIO, &i);  [2]
        Cprintf("%s: dlm_set_signal: i %d\n", prog, i);
 
        resnlen = strlen(resnam);  [3]
 
        /* get EX mode lock and establish blocking notif routine */
        Cprintf("%s: grab a EX mode lock\n", prog);
        stat = dlm_lock(nsp, (uchar_t *)resnam, resnlen, 0, &lkid,
        		DLM_EXMODE, &vb, (DLM_VALB | DLM_SYNCSTS),
                        0, 0, blk_and_go, 0);
	                [4]
        /*
         * since we're the only one running it
         * had better be granted DLM_SYNCH status
         */
        if(stat !=  DLM_SYNCH) {
        	printf("%s: dlm_lock failed\n", prog);
        	error(&lkid, stat);  [5]
        }
        /* newly-created value block should be empty */
        printf("%s: value block read\n", prog);
        printf("%s: expected empty value block got <%s>\n", prog,
         				    vb.valblk);
        if (strlen(vb.valblk)) {
	        printf("%s: lock: value block not empty\n", prog);
        	error(&lkid, stat);  [6]
        }
        printf("%s: start client and wait for the blocking
        			  notification to continue\n",
                prog);
        while (!done)
         	sleep(1);  [7]
 
        done = 0;
        /* put a known string into the value block */
        (void) strcat(vb.valblk, "abc");  [14]
        printf("%s: now convert (EX-->EX) to write the value block
                   <%s>\n", prog, vb.valblk);
        /* use a new blocking routine */
        stat = dlm_cvt(&lkid, DLM_EXMODE, &vb, (DLM_VALB |
                DLM_SYNCSTS), 0, 0, blkrtn, 0);
                [15]
        /*
         * since we own (EX) the resource the
         * convert had better be granted SYNC
         */
        if(stat !=  DLM_SYNCH) {
	        printf("%s: convert failed\n", prog);
        	error(&lkid, stat);
        }
 
        printf("%s: waiting for blocking notification\n", prog);
        while (!done)
        	sleep(1);
        printf("%s: trying to get the lock back as PR to read value
                block\n", prog);
        stat = dlm_cvt(&lkid, DLM_PRMODE, &vb, DLM_VALB, 0, 0,
                0, 0);  [19]
        if (stat != DLM_SUCCESS) {
        	printf("%s: error on conversion lock\n", prog);
        	error(&lkid, stat);
        }
        printf("%s: value block read\n", prog);
        printf("%s: expected <efg> got <%s>\n", prog, vb.valblk);
        /* compare to the other known string */
        if (strcmp(vb.valblk, "efg")) {
	        printf("%s: main: value block mismatch <%s>\n",
		prog, vb.valblk);
        	error(&lkid, stat);  [23]
        }
        printf("%s done\n", prog);  [24]
        exit(0);
}
/********************************************************************
*                                                                   *
*                       api_ex_client.c                             *
*                                                                   *
*********************************************************************/
 
/* cc -g -o api_ex_client api_ex_client.c -ldlm */
 
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
 
#include <sys/dlm.h>
 
char *resnam = "dist shared resource";
char *prog;
int done = 0;
 
#ifdef DLM_DEBUG
int dlm_debug = 2;
#define Cprintf if(dlm_debug)printf
#define Dprintf if(dlm_debug >= 2 )printf
#else /* DLM_DEBUG */
#define Cprintf ;
#define Dprintf ;
#endif /* DLM_DEBUG */
 
void
error(dlm_lkid_t *lk, dlm_status_t stat)
{
        printf("\t%s: lock error %s on lkid 0x%lx\n",
        	prog, dlm_sperrno(stat), *lk);
        abort();
}
 
/*
* blocking routine that will release the lock and in doing so will
* write the resource value block.
*/
void
blkrtn(callback_arg_t x, callback_arg_t y, dlm_lkid_t *lk,
				      dlm_lkmode_t blkmode)
{
        dlm_status_t stat;
        dlm_valb_t   vb;
        int          i;
 
        Cprintf("*** %s: blkrtn: x 0x%lx y 0x%lx lkid 0x%lx blkmode
                %d\n", prog, x, y, *lk, blkmode);
        printf("\t*** %s: blkrtn: dequeue EX lock to write
         		 value block <%s>\n", prog, vb.valblk);
        printf("\t*** %s: hold the lock for a couple of seconds\n",
         	prog); 
        for (i = 0; i < 3; i++) {
        	printf("\t*** %s: sleeping\n", prog);
         	sleep(1);
        }
        /* make sure its clean */
        bzero(vb.valblk, DLM_VALBLKSIZE);
        /* write something different */
        (void) strcat(vb.valblk, "efg");  [20]
        if((stat = dlm_unlock(lk, &vb, DLM_VALB)) != DLM_SUCCESS)
        	error(lk, stat);  [21]
        /* let waiters know we're done */
        done = 1;
}
main(int argc, char *argv[])
{
        int resnlen, i;
        dlm_lkid_t lkid;
        dlm_status_t stat;
        dlm_nsp_t nsp;
        dlm_valb_t vb;
 
        /* first we need to join a namespace */
        if ((stat = dlm_nsjoin(getuid(), &nsp, DLM_USER)) !=
        DLM_SUCCESS) {
	        printf("\t%s: can't join namespace\n", argv[0]);
        	error(0, stat);  [8]
        }
 
        prog = argv[0];
 
        /* now let DLM know what signal to use for blocking routines */
        dlm_set_signal(SIGIO, &i);
        Cprintf("\t%s: dlm_set_signal: i %d\n", prog, i);  [9]
 
        resnlen = strlen(resnam);
        Cprintf("\t%s: resnam %s\n", prog, resnam);
 
        printf("\t%s: grab a NL mode lock\n", prog);
        stat = dlm_lock(nsp, (uchar_t *)resnam, resnlen, 0, &lkid,
		DLM_NLMODE, &vb, (DLM_VALB | DLM_SYNCSTS),
					      0, 0, 0, 0);
        /* NL mode better be granted SYNC status */
        if(stat !=  DLM_SYNCH) {
		printf("\t%s: dlm_lock failed\n", prog);
		error(&lkid, stat);  [10]
        }
        /* should be nulls since master hasn't written anything yet */
        printf("\t%s: value block read\n", prog);
        printf("\t%s: expected empty value block got <%s>\n", prog,
        vb.valblk);
        if (strlen(vb.valblk)) {
		printf("\t%s: value block not empty\n", prog);
		error(&lkid, stat);  [11]
        }
 
        done = 0;
        printf("\t%s: converting to NL->EX to get the value block.\n",
        prog);
        printf("\t%s: should see blocking routine run on master\n",
        prog);
        stat = dlm_cvt(&lkid, DLM_EXMODE, &vb, DLM_VALB, 0, 0,
				 blkrtn, 0);  [12]
        if(stat !=  DLM_SUCCESS) {
		printf("\t%s: dlm_cvt failed\n", prog);
		error(&lkid, stat);
        }
        /* should have read what master wrote, "abc" */
        printf("\t%s: value block read\n", prog);
        printf("\t%s: expected <abc> got <%s>\n", prog,
        vb.valblk);
        if (strcmp(vb.valblk, "abc")) {
		printf("\t%s: main: value block mismatch <%s>\n",
		prog, vb.valblk);
		error(&lkid, stat);  [17]
        }
        /* now wait for blocking from master */
        printf("\t%s: sleeping waiting for blocking notification\n",
        prog);
        while (!done)
		sleep(1);  [18]
        printf("\t%s: done\n", prog);  [22]
        exit(0);
}

  1. The api_ex_master.c program calls the dlm_nsjoin function to join the namespace of the resource on which it will request a lock. This namespace is the current process's UID, as obtained from the getuid system call. It is a namespace that allows access from processes holding the effective UID of the resource owner, as indicated by the DLM_USER parameter. If successful, the function returns a namespace handle to the location indicated by the nsp parameter. [Return to example]

  2. The api_ex_master.c program calls the dlm_set_signal function to specify that the DLM is to use the SIGIO signal to send completion and blocking notifications to this process. [Return to example]

  3. The api_ex_master.c program obtains the length of the resource name to be supplied in the subsequent call to the dlm_lock function call. The name of the resource is dist shared resource. [Return to example]

  4. The api_ex_master.c program calls the dlm_lock function to obtain an Exclusive mode (DLM_EXMODE) lock on the dist shared resource resource in the uid namespace. The namespace handle, resource name, and resource name length are all supplied as required parameters.

    The DLM_SYNCSTS flag indicates that the DLM should return DLM_SYNCH status if the lock request is granted immediately. If the function call is successful, the DLM returns the lock ID of the Exclusive mode (EX) lock to the location specified by the lkid parameter.

    This function call also specifies the DLM_VALB flag and a location to and from which the contents of the lock value block for the resource are written or read. The DLM copies the resource's lock value to this location when the lock requested by the dlm_lock function call is granted. Finally, the function call specifies the blocking notification routine blk_and_go. The DLM will call this routine after the lock has been granted and is blocking another lock request. [Return to example]

  5. The api_ex_master.c program checks the status value returned from the dlm_lock function call. If the status value is not DLM_SYNCH status (the successful condition value requested by the DLM_SYNCSTS flag in the dlm_lock function call), the lock request has had to wait for the lock to be granted. Because no other programs interested in this lock are currently running, this should not be the case. [Return to example]

  6. The api_ex_master.c program checks that the contents of the value block the DLM has written to the location specified by the vb parameter are empty. [Return to example]

  7. The api_ex_master.c program waits for you to start the api_ex_client.c program. It will resume when its Exclusive mode (DLM_EXMODE) lock on the dist shared resource receives blocking notification that it is blocking a lock request on the same resource from the api_ex_client.c program. [Return to example]

  8. After you start it, the api_ex_client.c program calls the dlm_nsjoin function to join the uid namespace: that is, the same namespace that the process running the api_ex_master.c program previously joined. [Return to example]

  9. The api_ex_client.c program, like the api_ex_master.c program, calls the dlm_set_signal function to specify that the DLM is to use the SIGIO signal to send completion and blocking notifications to this process. [Return to example]

  10. The api_ex_client.c program calls the dlm_lock function to obtain a Null mode (DLM_NLMODE) lock on the same resource on which the process running the api_ex_master.c already holds an Exclusive mode lock. The DLM_SYNCSTS flag indicates that the DLM should return DLM_SYNCH status if the lock request is granted immediately. This lock request should be granted immediately, because the Null mode (NL) lock is compatible with the previously granted Exclusive mode lock. This function call also specifies the DLM_VALB flag and a pointer to a lock value block. The DLM copies the resource's lock value to this location when the lock requested by the dlm_lock function call is granted. [Return to example]

  11. The api_ex_client.c program checks the contents of the value block the DLM has written to the location specified by the vb parameter. The value block should be empty because the api_ex_master.c program has not yet written to it. [Return to example]

  12. The api_ex_client.c program calls the dlm_cvt function to convert its Null mode lock on the resource to Exclusive mode. It specifies a blocking notification routine named blkrtn. Because the process running the api_ex_master.c program already holds an Exclusive lock on this resource, it is blocking the api_ex_client.c program's lock conversion request. However, because the Exclusive mode lock taken out by the api_ex_master.c program specifies a blocking notification routine, the DLM uses the SIGIO signal to send the process running the api_ex_master.c program a blocking notification, triggering its blocking notification routine (blk_and_go). [Return to example]

  13. The blk_and_go routine sleeps for three seconds and then sets the done flag, which causes the api_ex_master.c program to resume. [Return to example]

  14. The api_ex_master.c program writes the string abc to its local copy of the resource's value block. [Return to example]

  15. The api_ex_master.c program calls the dlm_cvt function to write to the lock value block. To do so, it "converts" its Exclusive mode lock on the resource to Exclusive mode (DLM_EXMODE), specifying the lock ID, the location of its copy of the value block, and the DLM_VALB flag as parameters to the function call. The DLM_SYNCSTS flag indicates that the DLM should return DLM_SYNCH status if the lock request is granted immediately. This lock conversion request should be granted immediately because the process already holds an Exclusive mode lock on the resource.

    The dlm_cvt function call also specifies the blkrtn routine as a blocking notification routine. The DLM will call this blocking notification routine immediately because this Exclusive mode lock on the resource blocks the lock conversion request from the api_ex_client.c program. [Return to example]

  16. The api_ex_master.c program's blkrtn routine runs and immediately tries to downgrade its lock on the resource from Exclusive mode to Null mode by calling the dlm_cvt function. This call should succeed immediately. [Return to example]

  17. As soon as this conversion takes place, the api_ex_client.c program's lock conversion request succeeds. (The Null mode lock held by the process running the api_ex_master.c program is compatible with the Exclusive mode lock now held by the process running the api_ex_client.c program.) In upgrading the Null mode lock to Exclusive mode, the DLM copies the resource lock value block to the process running the api_ex_client.c program. At this point, the api_ex_client.c program should see the abc text string that the api_ex_master.c program wrote previously to the resource's lock value block. [Return to example]

  18. The api_ex_client.c program goes to sleep waiting for a blocking notification. [Return to example]

  19. The api_ex_master.c program, which has been sleeping since it downgraded its lock on the dist shared resource resource, calls the dlm_cvt function to convert its Null mode lock on the resource to protected read (DLM_PRMODE) mode. Because the process running the api_ex_client.c program already holds an Exclusive lock on this resource, it is blocking the api_ex_master.c program's lock conversion request. (That is, the Exclusive mode and protected read locks are incompatible.) However, because the Exclusive mode lock taken out by the api_ex_client.c program specifies a blocking notification routine, the DLM delivers it a blocking notification by sending it a SIGIO signal, triggering its blocking notification routine (blkrtn). [Return to example]

  20. The blkrtn blocking notification routine in the api_ex_client.c program sleeps for a few seconds and writes the text string efg to its local copy of the resource's value block. [Return to example]

  21. The blkrtn routine calls the dlm_unlock function to release its lock on the resource. In specifying the address of its local copy of the resource's lock value block and the DLM_VALB flag, it requests the DLM to write the local copy of the value block to the resource when its lock granted mode is Protected Write (DLM_PWMODE) or Exclusive (DLM_EXMODE). The granted mode here is DLM_EXMODE so the local copy of the value block will be written to the resource's lock value block. [Return to example]

  22. The api_ex_client.c program completes and exists. [Return to example]

  23. As soon as the process running the api_ex_client.c program releases its lock on the resource, the api_ex_master.c program's lock conversion request succeeds. In upgrading the Null mode lock to protected read mode, the DLM copies the resource lock value block to the process running the api_ex_master.c program. At this point, the api_ex_master.c program should see the efg text string that the api_ex_client.c program wrote previously to the resource's lock value block. [Return to example]

  24. The api_ex_master.c program completes and exits. [Return to example]