Connections Third Quarter 2005
a newsletter for our clients and friends





The Critical Need for Continuity Planning

As systems for computing and communication become more integral to learning, teaching and administration the need for reliability and resilience in the face of crisis and disruption becomes ever more critical. Additionally, because the school system is most often the largest employer, and is certainly the most pervasive organization, within any community, the schools become a vital resource as places of refuge and coordination in times of crisis. In our experience with districts large and small, there is seldom an adequate plan of action to mitigate risk and sustain systems operation during times of crisis facing the schools or the community.

home
email us
our services

This article concerns communication and computing, but we and our partners are also able to engage threat assessments as well as develop facility security systems and safety protocols for students and staff.


E-rate on your mind? Please see our
Previous newsletter

 

Steps to Success

Continuity Planning is generally engaged to protect and maintain information and communication systems in the face of disruptive events. Most importantly, however, its objective must be to sustain and re-establish workflow and operations for learning, teaching and administration with as little disruption as possible. In every case it is important that information survives and communication is available, but in the majority of cases, it is vital that the processes of administration, teaching and learning are sustained as well. We have discovered that in a worst-case scenario, information systems can often be reduced to pencil and paper, but with no understanding of which operations are most important, even this limited capability may not be used effectively. And has been repeatedly discovered, reliable communication technology is vital for an effective response.

The fundamental goal for continuity planning is twofold:
• To anticipate and prevent as many potential hazards and crises as possible,
• To achieve a state of operational continuity where infrastructure technology systems are continuously available irrespective of crisis events.

For this reason we call this service area Continuity Planning instead of disaster recovery which is a more common term. In addition, our concept of continuity planning is biased toward action instead of simply drafting a plan. We find there are four steps to successful continuity planning:
Readiness: understanding critical processes, developing strategies, eliminating risk
Response: identifying crisis events, taking protective action, controlling damage
Recovery: re-establishing critical information and communication systems
Reconstruction: re-building systems and transitioning processes back to normal.

Readiness addresses five areas:
• Creating awareness for hazards and potential crisis events,
• Mapping processes and their information and systems requirements,
• Developing a plan of action to respond and rebuild,
• Gathering resources to respond and
• Eliminating potential hazards and guarding against potential crises.

Awareness is building the capability throughout the staff to identify imminent crisis situations or potential problems, knowing who to notify and having the knowledge of how to respond if a crisis happens. Identification of processes and information clarifies which processes, and therefore, which systems and information are most critical, and the optimum method for protection to ride out or rebuild from a crisis. A plan catalogs actions to be taken during response, recovery and reconstruction depending on the form and severity of the crisis, the level of personal hazard as well as the time available. Having the necessary resources ready at hand whether they be sheets of plastic or backup media is critical to an effective response. Most important of all, however, is elimination of potential crises through prudent action. This spans adherence to a data backup plan, locating critical equipment away from water and waste piping, implementing backup power sources, being aware of who is in the building and why, and removal of fire hazards, among a myriad of other measures that will eliminate up to 90% of the potential hazards and crisis situations that could befall your systems and information.

Response is being able to quickly know that a crisis is imminent or has happened and having the means and knowledge to mitigate the crisis, to minimize the loss or to simplify recovery as much as possible. Sensing a crisis involves the proper surveillance and security protocols and the knowledge to know what is happening. In many cases it is being aware of, and heeding, warning signs. Mitigation is taking steps to resolve the crisis or reduce its effect. This could be an orderly shutdown or protection from damage. Minimizing loss and simplifying recovery involve actions that protect systems to better withstand the crisis event. Of course, all of this depends on the time available to mount a response and the hazard presented to the respondent. With proper readiness, loss should be minimized drastically even if an effective response is not possible.

Recovery is re-establishing the most important systems and information at a safe location or in limited circumstances at the original facility. It also includes engaging the processes and activities that are necessary to support the most important functions of administration, teaching and learning. The goal of recovery should be to assure reliable systems and information, so that users can carry on their activities with as little compromise as practical. The level of recovery could span a significant range of systems and services as well as scale from a single piece of equipment to a complete facility. There are practical limits, however, and users must understand how to adapt. In addition, effected persons must be aware of what recovery efforts are being accomplished and the timeframe to build confidence and avoid concern. For example, paychecks should always go out and links of communication should be re-established as the first line of recovery.

Reconstruction is returning operations back to their normal condition at a permanent location. This is more than making temporary provisions permanent, it is establishing the normal routines and processes as well as re-developing the readiness provisions to address future crisis events. While it may be tempting, reconstruction is not the time to consider new processes or major systems upgrades.

Development

It is vital to first identify critical processes and the underlying information systems supporting them so that an optimum protection and recovery strategy can be put into practice.

The structure of operational processes for administration, teaching and learning can be characterized by layers. At the bottom are systems that include cabling, computers and networks. Next comes the data that resides on system equipment or traverse the network. Applications come next turning the data into information, including documents and files, with the user interface associated with them. Finally come processes. Simply stated, a process is a set of persons, activities and resources that come together over time to accomplish a common task.

Development of the Continuity Plan begins with identification of critical processes and works down the layers toward a clear definition of the underlying data and systems necessary to sustain that process. At each layer, potential points of failure and disruptive events are identified and a plan is developed to prevent, remediate or mitigate the occurrence. Overall, this reduces the risk of process disruption while providing the means to sustain or recover the process should a disruption occur. Additionally, during times of crisis, certain information becomes more vital than during normal operations.

The importance of a process is based on multiple factors that include:
• The significance of the process to the functioning of the organization.
• The cost to recover or re-create the information, work or outcome if the process is disrupted.
• The perception of organizational viability during a disruption.
• The scope of the process in proportion to the organization.
• The effort to reconstruct the data vs. the effort to protect it.

Each of these factors is evaluated individually and then they are weighed as a group to arrive at an overall importance rating. Some processes are critically important for one or more reasons at all times. Some processes may be significant only at specific times or to a proportion of the organization that changes with time. Other process and their underlying systems may actually be simpler to rebuild.

Critical processes at all levels in the district are vulnerable to disruptive events. Insurers have identified the most common forms of crises to effect information and communication systems to include:
• Virus and malware attacks
• Power outages
• Human error
• Fires
• Water leaks
• Premises liability issues (code violations and facility hazards).
The general range of hazards includes:
• Natural – hurricane, tornado, fire, flood, etc.
• Human – operator error, sabotage, malicious code, terror attacks, etc.
• Systemic – equipment failure, software error, telecommunications network outage, electric power failure, etc.

These causes must be considered first when developing a continuity plan. While fire and water may be highly destructive, most disruptions are temporary and not damaging to systems equipment or the network, but data is lost or applications are corrupted. This set of situations is best mitigated through effective backup strategies that include re-loading applications and recovering data to a prior point in time.

The next set of situations that require consideration is recovery from a truly tragic event where the facility and contained systems are destroyed or rendered unusable and the entire systems infrastructure must be rebuilt at another location to support operations. This set of situations is best mitigated with effective readiness that includes data backup, but also includes the materials and procedures to prevent or lessen the damage to systems.

In any case, the key to success for effective protection and recovery is readiness. Readiness includes backup and redundancy of data but it also includes being aware of the need for data protection, of the nature of common disruptive events, of the need for routine maintenance and the elimination insofar as possible of hazards that could act to cause a disruption.

Implementation

Steps to achieve readiness and start continuity planning include:
• Identify critical processes fundamental to operation of the department or school
• Identify irreplaceable information
• Define supporting applications and resources
• Develop facility and physical protection
• Define responsible staff and user base
• Create continuity plan
• Eliminate hazards
• Plan and test refresh procedures
• Remove points of failure within systems
• Remove facility deficiencies
• Design and implement robust systems
• Backup and protect data
• Eliminate high-risk exposure
• Engage training in crisis awareness and response methods
• Sustain communication systems if possible.

Steps to respond to crisis events include:
• Implement and follow protective procedures
• Identify crisis
• Notifying internal and external stakeholders
• Initiate mitigation procedures, if possible
• Attempt controlled shutdown
• Implement backup channels

Steps to recover critical systems and processes include:
• Stage recovery procedures
• Secure access to backup facility
• Integrate necessary system components
• Restore operating system
• Restore application software
• Obtain and load backup application data
• Test system functionality
• Connect primary users to systems

Steps to reconstruct and re-establish systems and services include:
• Develop or recondition facilities
• Establish new, duplicate, systems if possible
• Notify users and schedule transition
• Conduct orderly shutdown and transfer data
• Bring up and test new systems
• Relocate staff and re-establish processes.



informative articles at our Website

E-rate process
Process, procurement and compliance

Continuity Planning
Risk assessment, readiness, response, recovery and reconstruction

Decision Support
Change methodologies, warehousing and NCLB

Forensic Accounting
Financial analysis, planning and ROI

Project Rescue
Administration, leadership, tracking and recovery

 


fine print...

Please tell us what you think of the information or layout by sending a note to newsletter
@millenniumstrategies.com
.

This newsletter is an expression of our insight and opinion. The information presented here is provided without warranty and we advise prudent and diligent thought before using it.

This document may not be copied in whole or in part by any means unless you write to us and ask and we write back and tell you it is ok.






© 2005 Millennium Strategies, all rights reserved