|
This
page discusses the test bed that we used with the TruCluster
Software Production Server, Version 1.6, production-level
cluster, and which was used as the upgrade test bed for the
migration to a TruCluster Server Version 5.0A cluster.
Linked
to this page are descriptions of the hardware and software
that we configured for the test bed.
The
following sections address some common questions about the
creation of a test bed.
Justification
for a Test Bed for a High-Availability Cluster
A typical standalone system can achieve
about 99% availability. Customers invest in Compaq's TruCluster
products to reduce the 90 hours of annual downtime contained
within that 1% of unavailability. A test bed is essential
to protect that investment.
At
some point in their life cycle, all systems require hardware
or software changes and software patches. Each change requires
testing, tuning, and downtime. With a test bed, you can change,
test, and tune off-line and spend the 90 hours as production
time rather than downtime.
The
Gartner Group (1998) studied downtime costs for a variety
of industries and found that costs ranged from $10,833 per
minute for brokerage applications to $1,491 per minute for
airline reservation systems. Thus, time spent upgrading on
a test bed, identifying and resolving problems, and developing
deployment procedures can generate significant savings by
reducing downtime on the production system.
A test bed can be key to the quality, stability, and availability
of the production-level cluster. We use our test bed to:
- Run
critical applications in an environment changed by patches,
new software, or new hardware and see the effects of those
changes before they can impact the business.
- Develop
maintenance procedures off-line that help us to minimize
scheduled downtime on our production-level cluster.
When creating a test bed, you incur the
expense of purchasing and managing duplicate hardware and
software. However, a test bed is an investment because patches,
new software versions, changes to hardware configurations,
and upgraded software have the potential to place your key
applications at risk. The test bed is the place where you
can quantify that risk and decide how to manage it, without
jeopardizing day-to-day operations.
Test Bed
Use
In general, the following changes require
initial application on a test bed and use of a formal process
to deploy them to the production-level cluster:
- Operating
system upgrades
- Patch
kit installations
- Firmware
updates
- Major
hardware configuration changes
- Major
application upgrade
Typically, a test bed and formal process
are not needed for simple maintenance tasks such as adding
disks to storage shelves. Also, changes that are not complex
can first be made to less critical systems, run for a reasonable
length of time, and if run successfully, made to the production
system.
Length of
Testing in a Test Bed
Based on our experience, the uniqueness
and complexity of changes determine the length of time that
changes are tested before deployment to the production-level
cluster. If the changed component (hardware or software) is
unique to your environment and it changes the behavior of
the environment, you may want to plan a longer testing period.
If the change is common and behavior is predictable, the testing
period can be short. For example, off-the-shelf hardware or
applications typically require less testing time than customized
hardware or software.
An operating system update is complex.
Software patches may be simple. Both may be high risk because
of dependencies with key applications. As a general guideline,
we exercise operating system updates in the test bed for 20
days before deploying them to the production-level cluster.
We install and exercise a patch kit in the test bed for 10
days before deployment. For more information on patch kits
and the production-level cluster, see TruCluster
Server High Availability Case Study Patching Guidelines.
Level
of Duplication Between a Test Bed and a Production-Level
Environment
Ideally,
a test bed is a mirror image of the production-level cluster
environment, but the cost of duplicate hardware and software
invariably compromise that goal. More important than duplicating
the environment is the need to test changes against critical
applications that run under the same operating system versions
and services.
Industry
analysts suggest that system administrators do the following:
- Use
the same hardware and software versions in the test environment
and in the production-level cluster.
- Segregate
the test bed from the production-level cluster to ensure
that code under development or testing does not leak into
the production-level cluster.
- Use
software tools and formal procedures to test new software,
hardware, and patches. If you use formal procedures, you
can ensure that tests are consistent, link defects back
to the point of failure, and verify that code is fully tested
before you deploy it to the production-level cluster.
Establishing
Test Bed Objectives
Some
analysts have described the following relationships between
the size and completeness of a test bed and the objectives
of the test bed.
- System
and load testing require a test bed that is similar in size
and scope to the production-level cluster. The greater the
difference between the test bed and the production-level
cluster, the less credible the results of system and stress
testing. The test environment should support various stress-level
scenarios and automated testing and measuring tools.
- Likewise,
the more complex the production-level cluster, the more
thoroughly the test bed should mirror that environment.
For the sake of reliable measurement, a system administrator
should not test patches or other changes on a subset of
a complex system.
Other Factors
That Affect Test Bed Decisions
- Resource
Constraints The physical limits of a lab, the ability
to segregate a test bed from a production-level cluster,
and budget limitations all affect decisions on the size,
content, and completeness of a test bed.
- Culture
A test bed represents a reasoned approach toward
managing risk. The deployment of updated applications, new
hardware, and patches all entail the risk of disrupting
a production-level cluster. Management must support the
costs involved in managing the risk.
- Support
A test bed, as with any other system of hardware
and software, requires support. People must be charged with
responsibility for maintaining an up-to-date test bed environment.
Return
to TruCluster Server High Availability Case Study
|