Architecting Highly Available
CompactPCI Systems
2003/04/04
High Availability is an overused term in today's marketplace. Vendors
have used this term to define architectures as simple as redundant power
supplies and as complicated as fully redundant systems. This leads to
the question "What is High Availability?". It might be easier to think
of High Availability as an increase in the availability of a system, or
a decrease in downtime. Many of today's telecommunications systems require
5NINES availability or 99.999% uptime. The amount of downtime allowed
in these systems is 5.26 minutes per year ( 525,600 minutes/year x 99.999%).
The 5 minutes of downtime includes scheduled maintenance as well as any
downtime that might result from the failure of any part of the system.
Designing High Availability systems that are capable of obtaining 5NINES
availability will generally require that every function in the system
be redundant, that is there is no single point of failure. The road to
High Availability systems generally includes redundant power supplies,
fan trays, and mirrored hard drives. The addition of these redundant components
will decrease the probability that a component failure will cause a system
failure. The addition of the redundant components has increased the availability
of the system; it is now more highly available. As you might expect adding
redundancy to power supplies, fans and hard drives is relatively straight
forward. Providing for redundant compute elements in a system is a more
complicated challange.
Application of CompactPCI to High Availability Applications
Developers have been applying PICMG 2.0 CompactPCI Specification compliant
systems to a variety of High Availability applications over the years.
As the market requirements for High Availability have increased, CompactPCI
systems have had to evolve to meet the new challenges. The original CompactPCI
systems were simple bus based architectures. Figure 1 shows typical first
generation CompactPCI architecture.
PICMG 2.0 CompactPCI compliant systems are composed of one or more CompactPCI bus segments. Each segment can contain up to eight CompactPCI board slots. Each bus segment contains one System Slot and up to 7 Peripheral Slots. The PCI bus is used as the primary communication path between the slots in each bus segment. In this architecture the PCI Bus and the System Slot are single points of failure. A misbehaving Peripheral Slot can bring down the entire PCI Bus segment preventing communication between any of the slots. This single point of failure was a significant obstacle to the adoption of CompactPCI in High Availability applications. Early architects of CompactPCI High Availability systems had to overcome the limitation of the single point of failure PCI Bus. The typical solution was to add a second CompactPCI bus segment and duplicate the functionality in both bus segments. Figure 2 shows an example of a dual CompactPCI bus based architecture.
In Figure 2 dual bus segments and dual System Slots are used to provide
redundancy for the single points of failures that exist in standard Compact
PCI architectures. In the Dual Segment architecture, each of the System
Slots can control either of the two PCI Bus Segments. By providing redundant
System Slots, a failure of either System Slot can now be compensated for.
This architecture also covers the potential fault of a PCI bus. If a fault
occurs in PCI Bus 1, then PCI Bus 2 is available to handle the task. The
engineering challenges with this kind of architecture are complicated.
The System Slots provide clocks, arbitration and interrupt servicing for
a bus segment. The failover of a System Slot requires that the clock drivers,
request/grant arbitration and interrupt controllers also transfer over
to the active System Slot. Knowing when a bus has failed and then being
able to bring up the redundant System Slot without impacting the total
system availability is difficult. In 1999 PICMG formed a subcommittee
to standardize an implementation of Redundant System Slots. The PICMG
2.13 Redundant System Slot specification was abandoned three years later.
PICMG 2.13 is the only subcommittee that was disbanded without completing
a specification. This is largely due to the complexities of the problem
and the propriety solutions that exist. It is clear that redundant system
slots in CompactPCI can be used to increase system availability but at
a cost and at a level of complexity that are prohibitive. Vendors that
provide this type of architecture are selling proprietary solutions -
not open architectures.
Adding IP Data Transport to CompactPCI
In September 2001, PICMG approved the PICMG 2.16 Packet Switched Backplane specification. This specification defines 10/100/1000Mbit Ethernet interconnects between peripheral slots and fabric slots in a compact PCI segment. The fabric slots are redundant. PICMG 2.16 compliant systems have been deployed in a variety of applications. The ubiquitous nature of the Ethernet interconnects and the need for IP data transports has led to high levels of adoption among system providers. Figure 3 shows a typical PICMG 2.0 and 2.16 architecture.
In PICMG 2.16 compliant systems the IP data transport can be used as
the primary communications channel within the system. This communications
path has redundant links to redundant Fabric Slots
The PICMG 2.16 specification allows an architect to avoid using the CompactPCI
bus altogether, and provides a way of increasing system availability without
increasing the cost of the system. PICMG 2.16 compliant systems are inherently
redundant - there is no single point of failure. The Ethernet fabric is
a convenient way to handle packet based data transport that we see in
next generation applications.
The next step in the evolution of highly available CompactPCI systems
is the removal of the System Slot. As applications take advantage of the
IP interconnects in today's systems, the PCI bus is becoming an unused
expense. PICMG is working on a specification called CompactTCA. The CompactTCA
specification is expected to combine the system management capabilities
defined in AdvancedTCA (PICMG 3.0) the form factor defined in PICMG 2.0
and the data transport defined in PICMG 2.16. This architecture will not
contain a PCI bus. This kind of system will be able to support 24 Peripheral
slots and two Fabric Slots. The elimination of the PCI bus will reduce
the cost of the boards used in CompactPCI systems, reduce the complexities
of providing redundant system slots and increase the total slot count.
Figure 4 shows an example of a possible CompactTCA system.
Summary
PICMG 2.16 Packet Switched Backplane is a viable way to improve the availability
of systems built today. The elimination of single points of failure found
in first generation CompactPCI systems and the addition of redundant data
transports provide the building blocks necessary to achieve 5NINES availability.
Systems designers should beware of vendors providing products based on
proprietary Redundant System Slot architectures. These closed architecture
systems will not benefit from the CompactPCI ecosystem that exists today.
It is clear that CompactPCI systems using PICMG 2.16 Packet Switched backplanes
will provide the combination of point to point data transports and redundancy
necessary to achieve 5NINES availability as well as providing a migration
path to future technologies.
凌華科技供稿 CTI論壇編輯