subject: IT Stability Step 2: Don't Count on Good Luck [print this page] IT Stability Step 2: Don't Count on Good Luck
As I mentioned in my previous post, the most reliable way to bring down a network is to depend on regular, manual intervention in your IT systems. However, exposure to human error is relatively easy to address. Technical failure (hardware and software), on the other hand, is a much tougher problem. As it happens, technical failure is the number one cause of systems failure and data loss for SMBs.
Given how complex modern technology is today, most devices are astoundingly reliable. But its a mistake to assume a device or system will always work as planned. SMBs must *assume* that any of their systems could fail at any time, and plan accordingly. How does one plan for random failures? There is a one word answer for that: redundancy.
Any system that is critically important to an SMBs operations should have some level of redundancya secondary system in case the primary device fails. The most common type of redundancy youll see is RAID (Redundant Array of Inexpensive Disks). RAID is almost universal these days on servers in a SMB environment. The idea of RAID is to store the servers data redundantly across multiple disks in such a manner that if any one disk fails, the remaining disks will be able to fill in the data that was lost.
The challenge with redundancy is mostly economical. Obviously, it is not possible to have a secondary or replacement unit for *everything*. SMBs need to decide 1) which systems are critical; 2) what is the SMBs tolerance for downtime of that system; 3) cost of the redundant system above the baseline cost; 4) performance of the redundant system (reduced downtime over the baseline).
Here are some examples:
Critical System Solution Cost over basic system Performance over basic system
WAN (internet) network connection Multiple internet service lines and compatible router $ Excellent: generally provides instantaneous re-routing of traffic in cases of an outage. Some routers require a manual switch.
Router Multiple routers using a redundancy protocol $$ Usually less than a minute of downtime in event of device failure.
Router Have a pre-configured spare handy. $ Requires time to diagnose and manually replace router. Hours of downtime depending on availability of technicians.
Server Clustering $$$ Excellent: servers automatically adjust to load.
Server Backup server with virtualization capability $$ Good: 1-4 hours of downtime.
Server Have a spare handy and a backup with bare-metal restore capability. $$ Adequate: 8-48 hours of downtime.
Backup Media Add an offsite component to supplement local storage. $ Excellent.
This is just scratching the surface of the type of redundant technology options are available to SMBs. Heres some homework for you: what is your most critical system? What would happen to your business if it failed right now? How long can you tolerate the system being down without a redundant system kicking in? Leave a comment and we can discuss your redundancy options!