TruCluster Server High Availability Case Study Test Bed Overview

Tru64 UNIX documentation

Tech tips and white papers

» Best practices
» Technical updates
» White papers & tech tips
» Send us your comments

Tru64 UNIX documentation

» Tru64 UNIX operating system
» TruCluster software
» Advanced Printing Software
» Advanced Server for UNIX
» Device driver
» Internet Express
» New hardware delivery
» Patch kits
» Porting guides
» POSIX conformance
» secure shell
» Windows 2000 single
sign-on

Related links

» Tru64 UNIX home
» Tru64 UNIX QuickSpecs and software product descriptions
» TruCluster Server high availability case study
 

This page discusses the test bed that we used with the TruCluster Software Production Server, Version 1.6, production-level cluster, and which was used as the upgrade test bed for the migration to a TruCluster Server Version 5.0A cluster.

Linked to this page are descriptions of the hardware and software that we configured for the test bed.

The following sections address some common questions about the creation of a test bed.

Justification for a Test Bed for a High-Availability Cluster

A typical standalone system can achieve about 99% availability. Customers invest in Compaq's TruCluster products to reduce the 90 hours of annual downtime contained within that 1% of unavailability. A test bed is essential to protect that investment.

At some point in their life cycle, all systems require hardware or software changes and software patches. Each change requires testing, tuning, and downtime. With a test bed, you can change, test, and tune off-line and spend the 90 hours as production time rather than downtime.

The Gartner Group (1998) studied downtime costs for a variety of industries and found that costs ranged from $10,833 per minute for brokerage applications to $1,491 per minute for airline reservation systems. Thus, time spent upgrading on a test bed, identifying and resolving problems, and developing deployment procedures can generate significant savings by reducing downtime on the production system.

A test bed can be key to the quality, stability, and availability of the production-level cluster. We use our test bed to:

  • Run critical applications in an environment changed by patches, new software, or new hardware and see the effects of those changes before they can impact the business.
  • Develop maintenance procedures off-line that help us to minimize scheduled downtime on our production-level cluster.

When creating a test bed, you incur the expense of purchasing and managing duplicate hardware and software. However, a test bed is an investment because patches, new software versions, changes to hardware configurations, and upgraded software have the potential to place your key applications at risk. The test bed is the place where you can quantify that risk and decide how to manage it, without jeopardizing day-to-day operations.

Test Bed Use

In general, the following changes require initial application on a test bed and use of a formal process to deploy them to the production-level cluster:

  • Operating system upgrades
  • Patch kit installations
  • Firmware updates
  • Major hardware configuration changes
  • Major application upgrade

Typically, a test bed and formal process are not needed for simple maintenance tasks such as adding disks to storage shelves. Also, changes that are not complex can first be made to less critical systems, run for a reasonable length of time, and if run successfully, made to the production system.

Length of Testing in a Test Bed

Based on our experience, the uniqueness and complexity of changes determine the length of time that changes are tested before deployment to the production-level cluster. If the changed component (hardware or software) is unique to your environment and it changes the behavior of the environment, you may want to plan a longer testing period. If the change is common and behavior is predictable, the testing period can be short. For example, off-the-shelf hardware or applications typically require less testing time than customized hardware or software.

An operating system update is complex. Software patches may be simple. Both may be high risk because of dependencies with key applications. As a general guideline, we exercise operating system updates in the test bed for 20 days before deploying them to the production-level cluster. We install and exercise a patch kit in the test bed for 10 days before deployment. For more information on patch kits and the production-level cluster, see TruCluster Server High Availability Case Study Patching Guidelines.

Level of Duplication Between a Test Bed and a Production-Level Environment

Ideally, a test bed is a mirror image of the production-level cluster environment, but the cost of duplicate hardware and software invariably compromise that goal. More important than duplicating the environment is the need to test changes against critical applications that run under the same operating system versions and services.

Industry analysts suggest that system administrators do the following:

  • Use the same hardware and software versions in the test environment and in the production-level cluster.
  • Segregate the test bed from the production-level cluster to ensure that code under development or testing does not leak into the production-level cluster.
  • Use software tools and formal procedures to test new software, hardware, and patches. If you use formal procedures, you can ensure that tests are consistent, link defects back to the point of failure, and verify that code is fully tested before you deploy it to the production-level cluster.
Establishing Test Bed Objectives

Some analysts have described the following relationships between the size and completeness of a test bed and the objectives of the test bed.

  • System and load testing require a test bed that is similar in size and scope to the production-level cluster. The greater the difference between the test bed and the production-level cluster, the less credible the results of system and stress testing. The test environment should support various stress-level scenarios and automated testing and measuring tools.
  • Likewise, the more complex the production-level cluster, the more thoroughly the test bed should mirror that environment. For the sake of reliable measurement, a system administrator should not test patches or other changes on a subset of a complex system.
Other Factors That Affect Test Bed Decisions
  • Resource Constraints — The physical limits of a lab, the ability to segregate a test bed from a production-level cluster, and budget limitations all affect decisions on the size, content, and completeness of a test bed.
  • Culture — A test bed represents a reasoned approach toward managing risk. The deployment of updated applications, new hardware, and patches all entail the risk of disrupting a production-level cluster. Management must support the costs involved in managing the risk.
  • Support — A test bed, as with any other system of hardware and software, requires support. People must be charged with responsibility for maintaining an up-to-date test bed environment.

Return to TruCluster Server High Availability Case Study