Skip to main content
Skip table of contents

Disaster Recovery

Scope and Objectives of Recovery Plan

This plan is limited in scope to recovery and business continuance from a serious disruption in activities due to non-availability of migVisor’s facilities.

The objective of this plan is to coordinate recovery of critical business functions in managing and supporting the business recovery in the event of a facilities disruption or disaster.

This include short or long-term disasters or other disruptions, such as fires, floods, earthquakes, explosions, terrorism, tornadoes, extended power interruptions, hazardous chemical spills, and other natural or man-made disasters.

Business Continuity and DR Plan Description

migVisor relies on the availability of some cloud vendor services, as well as internal services for normal operation. Below is a list of the major services and the guaranteed SLA availability per service:

Guaranteed SLA Availability

Service

Availability Target

Provider

Elasticsearch

>=99.95%

migVisor

Cloud Storage

>= 99.95%

CSP

Cloud SQL

>= 99.99%

CSP

Kubernetes Engine

>=99.95%

CSP

Pub/Sub

>=99.95%

CSP

Cloud DNS

= 100%

CSP

Cloud Load Balancing

>= 99.99%

CSP

Mean Guaranteed SLA: 99.5%, 99.95%, 99.99%, 99.95%, 99.95% >= 99.9686% of availability

Operations At Risk

Risks that can affect migVisor’s operations include natural disasters, cyber-attacks and loss of critical CSP.

The operations at risk include:

  • Determining how those risks will affect operations

  • Implementing Threat Modeling to mitigate the risks

  • Testing procedures to ensure they work

  • Reviewing the process to make sure that it is up to date

  • Migrating the infrastructure to an alternative CSP

Service Resiliency

Process

Processing Schedule

Working hours for incident processing

8x7 (GMT+2)

Response time

1 business day

Recovery plan testing frequency

Annually

Recovery Strategy

The recovery strategy is organized in the following order:

  1. Identification of the incident (automatic via monitoring tools, or user reported)

  2. Investigation phase

  3. DRP activation phase

  4. Recovery implementation phase

  5. Return to normal operation

Each activity is assigned to appropriate team members who has the primary assignment to complete the activity.

Recovery implementations by provider/resource type:

Provider/Resource Type

Recovery Implementation

Elasticsearch

  • Elasticsearch nodes located across multiple zones within a region

  • Data in Elasticsearch is replicated across the nodes

  • Persistent volumes maintain storage availability independently of the individual containers

Cloud Storage

  • Duo-region or multi region support

  • Service is resilient and not interrupted

  • Data and metadata stored redundantly across regions

  • Objects versioning and cloud backup

Cloud SQL

  • Databases support is regional with high availability

  • Backups location is multi-regional in the United States

  • Cloud storage is multi-regional with high availability

  • Point-in-time recovery is configured for protection against accidental deletion or writes

Kubernetes Engine

  • Distribute Kubernetes resources across multiple zones within a region

  • Persistent volumes maintain storage availability independently of the individual containers

  • Liveness probe restarts failed pods

  • Node auto repair

Pub/Sub

  • Replication is within just one region

  • Each topic uses three zones to store data

  • Synchronous replication is guaranteed to at least two zones, and best-effort replication to an additional third zone

Cloud DNS

  • Uses GCP global network of Anycast name servers to serve clients' DNS zones from redundant locations around the world

  • Providing high availability and lower latency

Cloud Load Balancing

  • Software-based managed service

  • Distributed across multiple zones in the region

Roles and Responsibilities

Describes key personnel and their assigned tasks during or after the incident. Each team member has a unique set of responsibilities for successfully completing BCP for each business function.

Roles and Responsibilities

Role

Area of responsibility

On-duty support engineer

Monitoring suspicious or abnormal activity

Initial investigation,  notification of DR Team

Minor issues fix

Support team

Detailed investigation

Recovery actions

Restoration of affected systems to normal operation

Security and DR testing

DR team head

Impact assessment

DR plan activation, decision- necessary alternative strategy and recovery methods

Inform customers and partners about potential harm

Post-incident analysis DRP improvement

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.