Disaster Recovery
Scope and Objectives of Recovery Plan
This plan is limited in scope to recovery and business continuance from a serious disruption in activities due to non-availability of migVisor’s facilities.
The objective of this plan is to coordinate recovery of critical business functions in managing and supporting the business recovery in the event of a facilities disruption or disaster.
This include short or long-term disasters or other disruptions, such as fires, floods, earthquakes, explosions, terrorism, tornadoes, extended power interruptions, hazardous chemical spills, and other natural or man-made disasters.
Business Continuity and DR Plan Description
migVisor relies on the availability of some cloud vendor services, as well as internal services for normal operation. Below is a list of the major services and the guaranteed SLA availability per service:
Guaranteed SLA Availability | ||
---|---|---|
Service | Availability Target | Provider |
Elasticsearch | >=99.95% | migVisor |
Cloud Storage | >= 99.95% | CSP |
Cloud SQL | >= 99.99% | CSP |
Kubernetes Engine | >=99.95% | CSP |
Pub/Sub | >=99.95% | CSP |
Cloud DNS | = 100% | CSP |
Cloud Load Balancing | >= 99.99% | CSP |
Mean Guaranteed SLA: 99.5%, 99.95%, 99.99%, 99.95%, 99.95% >= 99.9686% of availability |
Operations At Risk
Risks that can affect migVisor’s operations include natural disasters, cyber-attacks and loss of critical CSP.
The operations at risk include:
Determining how those risks will affect operations
Implementing Threat Modeling to mitigate the risks
Testing procedures to ensure they work
Reviewing the process to make sure that it is up to date
Migrating the infrastructure to an alternative CSP
Service Resiliency | |
---|---|
Process | Processing Schedule |
Working hours for incident processing | 8x7 (GMT+2) |
Response time | 1 business day |
Recovery plan testing frequency | Annually |
Recovery Strategy
The recovery strategy is organized in the following order:
Identification of the incident (automatic via monitoring tools, or user reported)
Investigation phase
DRP activation phase
Recovery implementation phase
Return to normal operation
Each activity is assigned to appropriate team members who has the primary assignment to complete the activity.
Recovery implementations by provider/resource type:
Provider/Resource Type | Recovery Implementation |
---|---|
Elasticsearch |
|
Cloud Storage |
|
Cloud SQL |
|
Kubernetes Engine |
|
Pub/Sub |
|
Cloud DNS |
|
Cloud Load Balancing |
|
Roles and Responsibilities
Describes key personnel and their assigned tasks during or after the incident. Each team member has a unique set of responsibilities for successfully completing BCP for each business function.
Roles and Responsibilities | |
Role | Area of responsibility |
On-duty support engineer | Monitoring suspicious or abnormal activity |
Initial investigation, notification of DR Team | |
Minor issues fix | |
Support team | Detailed investigation |
Recovery actions | |
Restoration of affected systems to normal operation | |
Security and DR testing | |
DR team head | Impact assessment |
DR plan activation, decision- necessary alternative strategy and recovery methods | |
Inform customers and partners about potential harm | |
Post-incident analysis DRP improvement |