|
|
| Document ID |
ISMS-BCP-001 |
| Version |
0.1 |
| Status |
Draft |
| Classification |
Internal |
| Owner |
Managing Director |
| Approved by |
Managing Director |
| Approval date |
pending |
| Effective from |
pending |
| Next review |
Annually, after every exercise, and after every disruption that triggers it |
| Annex A controls |
A.5.29 (information security during disruption), A.5.30 (ICT readiness for business continuity) |
This plan describes how BackupExperts continues to deliver — or
gracefully degrades — services to customers in the event of disruption,
and how it recovers afterwards. It accompanies, and is read together
with, the Backup Policy and the Incident
Response Policy.
Disruption scenarios in scope:
- Operator unavailability (illness, accident, family emergency,
holiday). Identified as the single dominant continuity risk for a
sole-proprietor operation.
- Loss of basement (fire, flood, theft, hardware failure cluster).
- Loss of Hetzner hosting (provider outage, account compromise,
regional incident).
- Loss of internet connectivity at the BackupExperts office.
- Loss of a critical SaaS (Microsoft 365, Lexoffice).
- Customer-side mass incident affecting multiple customers
simultaneously (e.g. a widespread ransomware campaign).
For each in-scope service, BackupExperts commits to:
| Service |
Behaviour during disruption |
Recovery target |
| Customer backup ingestion |
Customer-side Veeam continues to its local repo. Offload to BackupExperts MinIO may pause during disruption — this is acceptable provided the customer-side local copy continues. |
Restore offload within disruption-specific timeline (see §5). |
| Restore-on-demand |
Degrades to "served on Operator availability" if the Operator is unavailable. Customers are informed via the channel in §6. |
Per RPO/RTO commitments once the Operator is reachable. RPO/RTO are recorded in /continuity/rto-rpo (planned) — currently not formalised; see Risk Register R-004. |
| Tenant Wiki.js |
Survives Operator unavailability (no Operator action required). Survives Hetzner outage only to the extent of Hetzner's own continuity arrangements. |
Per Hetzner SLA. |
| Monitoring |
Continues unattended through Operator unavailability. Alerts queue for the Operator's return; critical alerts also page the cover contact (§4) once that arrangement is in place. |
Acknowledgement on Operator return. |
A formal cover arrangement is not yet in place. This is recorded
as risk R-006 in the Risk Register. Until it
is in place:
- Customers are informed in writing (in their service contract) that
BackupExperts is operated by a single person and that response to
restore requests during Operator unavailability is best-effort.
- Customer-side Veeam local copies provide a fallback that does not
require BackupExperts intervention.
- The Operator notifies customers in advance of planned absences
greater than 48 hours, with the expected return date and a temporary
emergency contact.
The treatment plan: identify and contract with another MSP or trusted
technician to act as cover, with read access to the Vaultwarden
secrets needed for that role and a documented hand-over procedure.
- Detection: Operator notifies customers in advance (planned
absence) or fails to respond within their contractual response
window (unplanned).
- Containment: Backup ingestion continues unattended; restore
requests queue.
- Communication: automatic out-of-office reply with the cover
arrangement details (once §4 is implemented) or with a statement of
the current solo posture and an expected return date.
- Recovery: Operator returns; queued requests are triaged in order
of urgency; any missed contractual SLA is logged as a nonconformity.
- Detection: alarm, monitoring failure, or physical inspection.
- Containment: confirm whether customer-side Veeam copies are
intact (they typically are — basement is the off-site copy from
the customer's perspective, not the only copy). Communicate with
affected customers.
- Recovery: rebuild the MinIO endpoint at an alternative location.
Until R-002 (second off-site copy) is implemented, recovery depends
on re-ingesting from customer-side Veeam local copies.
- Implication: until R-002 is closed, basement loss is the
single-largest residual backup-availability risk and is acknowledged
as such on the Risk Register.
- Detection: Hetzner status page; loss of access to tenant Wiki.js
instances or to monitoring.
- Containment: verify whether the loss is regional, account-level,
or provider-level. Customer backups are unaffected (their flow does
not transit Hetzner).
- Recovery: for app-tier disruption (Wiki.js, monitoring,
wiki-cms
tooling), recreate from Hetzner backups or rebuild from the
wiki-cms repository to a new provider. The repository is the
source of truth — Wiki.js content is reproducible.
- Detection: local indicators.
- Containment: Operator works from an alternative connection
(mobile hotspot or remote location). MinIO offload from customers
may queue at the customer side until restored.
- Recovery: ISP restoration. Document outage duration in the
records section.
- Microsoft 365 outage: short-term — wait out provider; no immediate
customer impact unless restore-coordination email is unavailable.
Use phone as the fallback channel.
- Lexoffice outage: invoicing delayed; no customer-data impact.
- For both, follow provider status updates and notify customers only
if their service to them is materially affected.
- Detection: monitoring alerts, customer reports.
- Containment: acknowledge first; triage by severity; apply the
Incident Response Policy to each
affected customer. Recall that immutability on MinIO is the line
of defence — Object Lock prevents the customer's compromised
credentials from deleting the basement copies.
- Recovery: restore each customer per the Restore Procedure.
Engage cover (once §4 is implemented) to parallelise.
During disruption, the Operator (or, once in place, the cover) is the
single communications point. Standing channels:
- Customer-facing: email and telephone, per the contact details
recorded in each customer's onboarding document.
- BackupExperts-internal: the ticket / register that triggered the
disruption response.
Customer-facing messages are factual, timestamped, and do not speculate.
The first message includes: what is known, what is not, what is being
done, and when the next update will follow.
The plan is exercised at minimum annually, using a tabletop scenario
covering at least one of the in-scope disruption types. The first
exercise is recorded in the DR test log
(planned). Findings become input to the risk register
and to the next iteration of this plan.
- DR test log (planned) — annual exercises
and any actual disruption-driven invocation.
- Incident register (planned) — disruption
events that meet the incident threshold.
- Risk register — outstanding continuity risks
(R-002, R-003, R-006 at time of issue).