Backup 3-2-1 Rule

The 3-2-1 backup rule is one of the oldest and most reliable strategies for protecting data. Originally popularized by photographer Peter Krogh in his book The DAM Book: Digital Asset Management for Photographers, it remains the foundation of nearly every modern backup policy. The rule is simple: keep 3 copies of your data, on 2 different types of media, with 1 copy stored offsite.

The strategy is endorsed by the US-CERT (Cybersecurity and Infrastructure Security Agency) and referenced in the NIST SP 800-34 Rev. 1 - Contingency Planning Guide as a foundational element of data protection.

The Rule Explained

	Copy 1	Copy 2	Copy 3
Role	Production	Local Backup	Offsite Backup
Media	SSD / NVMe	HDD / NAS / Tape	Cloud / Remote DC
Location	Local Site	Local Site	Remote Site

3 Copies

Maintain at least three copies of your data: the original (production) and two backups. A single backup is not enough because both the original and the backup can fail simultaneously. Disk failures, ransomware, accidental deletion, or software bugs can corrupt data without warning. With three copies, the probability of losing all of them at once drops dramatically.

2 Different Media Types

Store backups on at least two different types of storage media. If all copies live on the same type of hardware, a single class of failure (firmware bug, batch defect, same-model vulnerability) can destroy everything at once.

Common media type combinations:

Primary	Secondary
SSD/NVMe	HDD
HDD	Tape (LTO)
Local NAS	Cloud Object Storage
SAN	USB drives

The key is that different media types have different failure modes. An SSD and an HDD will not fail for the same physical reason at the same time.

1 Offsite Copy

At least one copy must be stored in a geographically separate location. Local disasters (fire, flood, theft, power surge) can destroy all equipment at a single site. An offsite copy survives even if the entire primary site is lost.

Offsite options include:

Cloud storage (Amazon S3, Backblaze B2, GCS, Azure Blob)
Remote data center or colocation facility
Physically transported tapes or drives stored at another location

RPO and RTO

Any backup strategy needs to answer two fundamental questions: how much data can you afford to lose, and how quickly do you need to recover.

Recovery Point Objective (RPO) defines the maximum acceptable amount of data loss measured in time. An RPO of 1 hour means you can tolerate losing up to 1 hour of data. This directly determines backup frequency: if your RPO is 1 hour, you need backups at least every hour.

Recovery Time Objective (RTO) defines the maximum acceptable downtime. An RTO of 4 hours means the system must be back online within 4 hours after a failure. This determines what kind of backup infrastructure you need and how fast your restore process must be.

RPO	Backup Frequency	Example
24 hours	Daily	Non-critical archives, documentation
1 hour	Hourly snapshots	Business applications, internal tools
Minutes	Continuous replication	E-commerce, financial transactions
Near zero	Synchronous replication	Payment processing, trading systems

The 3-2-1 rule does not prescribe specific RPO/RTO values, but the choice of tools and media directly affects both. Restoring from a local NAS is fast (low RTO) but a single daily backup means up to 24 hours of potential data loss (high RPO). Cloud-based continuous replication gives near-zero RPO but may have higher RTO depending on bandwidth and data volume.

Cost Considerations

Backup strategy involves a tradeoff between cost and protection level. Understanding storage costs helps in choosing the right combination of media.

Storage Cost Comparison

Media Type	Approximate Cost per TB/month	Durability	Access Speed
Local HDD (enterprise)	$1-2 (amortized over 5 years)	Moderate	Fast
Local NAS (RAID)	$3-5 (amortized)	High	Fast
LTO-9 Tape	$0.50-1 (amortized)	Very high	Slow (sequential)
Amazon S3 Standard	$23	Very high (99.999999999%)	Fast
Amazon S3 Glacier Deep Archive	$0.99	Very high	Hours
Backblaze B2	$6	Very high	Fast

Hidden Costs

Raw storage price is only part of the equation:

Egress fees: most cloud providers charge for downloading data. During a disaster recovery, you may need to download everything at once. S3 charges $0.09/GB for egress, meaning restoring 10 TB costs $900. Backblaze B2 and Wasabi offer free or reduced egress
API request costs: frequent incremental backups generate many PUT/GET requests. S3 charges per 1,000 requests, which adds up with small-file-heavy workloads. Backblaze B2 offers free egress up to 3x stored data
Bandwidth: uploading backups to remote locations requires sufficient upload bandwidth. A 10 TB initial backup over a 100 Mbps link takes roughly 9 days
Management overhead: more complex backup infrastructure requires more time to maintain, monitor, and test

Cost-Effective 3-2-1 Strategy

For most small to medium deployments:

Copy 1 (production): already paid for as part of infrastructure
Copy 2 (local NAS with HDD): one-time hardware cost, low ongoing cost
Copy 3 (B2 or S3 Glacier Deep Archive): $1-6/TB/month depending on access needs

A 5 TB dataset following this strategy costs roughly $5-30/month for offsite storage, plus the one-time cost of NAS hardware.

Backup Verification

A backup that has never been tested is a hope, not a strategy. Verification should cover both integrity checks and full restore tests.

Integrity Checks

Run automated integrity checks on every backup:

Checksum verification: tools like restic and borgbackup store checksums for every block of data. Run restic check or borg check regularly to detect silent corruption
Snapshot listing: verify that expected snapshots exist and cover the right time range. A backup job that silently stopped running last month is worse than no backup at all
Size validation: compare backup sizes over time. A sudden drop may indicate missing data. A sudden spike may indicate unwanted files being backed up

Automated Restore Testing

Schedule periodic restore tests to verify end-to-end recovery. At minimum, a restore test should restore the latest snapshot to a temporary directory, verify that critical files exist, and confirm that database dumps can be loaded without errors.

Verification Schedule

Check	Frequency	Automated
Backup job completion	Every run	Yes (monitoring alerts)
Checksum integrity	Weekly	Yes (cron/systemd timer)
Snapshot count and age	Daily	Yes (monitoring alerts)
Partial restore test	Monthly	Yes (script above)
Full bare-metal restore	Quarterly	Manual

The quarterly full restore is the most important test. It verifies not just data integrity but the entire recovery procedure: documentation, access credentials, network configuration, and the time it actually takes. Document each test result, including how long the restore took, and update the RTO estimate accordingly.

Common Mistakes

Testing restores: many organizations back up religiously but never verify that restores actually work. A backup that cannot be restored is not a backup. Schedule periodic restore tests.

Same failure domain: storing the “offsite” copy on another server in the same rack or building does not satisfy the offsite requirement. A single event can destroy both.

No encryption: backups stored offsite or in the cloud must be encrypted. Losing control of unencrypted backup data is a data breach.

No retention policy: keeping only the latest backup means that if corruption goes undetected for days, all backups may contain corrupted data. Use versioned or incremental backups with a retention window.

Ignoring backup integrity: backups should include checksums or verification steps. Silent data corruption (bit rot) can make backups unreadable over time.

Asymmetric encryption without private key management: encrypting backups with a public key (PGP, RSA) is convenient because anyone can create backups without access to the decryption key. However, if the private key is lost, every backup encrypted with it becomes permanently unreadable. Unlike a symmetric passphrase that a human can memorize or write down, a private key is a large binary blob that must be carefully stored and replicated. If you use asymmetric encryption for backups, store the private key in multiple secure locations independent of the backups themselves: a hardware security module, a password manager, and a printed paper copy in a safe. Test decryption regularly to confirm the private key is still accessible. A backup encrypted with a lost private key is identical to no backup at all.

Backing up only databases, not configuration: restoring a database is useless if the application configuration, TLS certificates, cron jobs, firewall rules, and service definitions are lost. A full recovery requires the entire environment, not just the data. Include system configuration, secrets, and infrastructure-as-code definitions in your backup scope.

No monitoring or alerting: a backup job that fails silently for weeks is a common cause of data loss. Without monitoring, the failure is only discovered when a restore is needed and nothing is there. Every backup job should report success or failure to a monitoring system, and missing reports should trigger alerts just like explicit failures.

Relying on RAID as a backup: RAID protects against individual disk failure, but it does not protect against ransomware, accidental deletion, software bugs, or controller failure. RAID is a high-availability mechanism, not a backup strategy. All data on a RAID array can be destroyed by a single rm -rf, an encrypted filesystem attack, or a failed controller that corrupts the entire array.

Relying solely on cloud provider snapshots: provider-managed snapshots (EBS snapshots, VM snapshots) are convenient but exist within the same provider and often the same account. An account compromise, billing issue, or provider-side bug can delete snapshots along with the primary data. Snapshots are a useful first layer but must be supplemented with an independent offsite copy.

Backing up running databases without consistency: copying database files from a running instance produces a corrupted backup. Databases must be backed up using their native dump tools (pg_dump, mysqldump, mongodump) or filesystem snapshots taken while writes are frozen. A backup that cannot be restored due to corruption is worse than no backup because it creates a false sense of security.

Not documenting the restore procedure: even with perfect backups, a restore can fail if no one knows the exact steps, credentials, and order of operations. Document the restore procedure, store it outside the systems being backed up, and verify that someone other than the original author can follow it successfully.

Extended Rules

The original 3-2-1 rule has evolved as threats have changed. Two notable extensions are now widely recommended.

3-2-1-1-0

This extension adds two requirements:

1 copy must be air-gapped or immutable (protected from ransomware that can encrypt network-accessible backups)
0 errors verified through regular restore testing

Immutability can be achieved through:

Object lock on cloud storage (S3 Object Lock, B2 immutable buckets)
WORM (Write Once Read Many) tape
Offline/air-gapped drives that are disconnected after backup

4-3-2

For critical data, some organizations use the 4-3-2 rule:

4 copies of the data
3 different storage media types
2 offsite copies in different geographic locations

This provides additional redundancy against regional disasters and cloud provider outages.

Practical Implementation

A minimal 3-2-1 setup for a Linux server:

Copy	Media	Location	Tool
Original	SSD (production server)	Primary DC	-
Backup 1	NAS with HDD (ZFS/RAID)	Primary DC	restic, borgbackup
Backup 2	Cloud object storage	Remote (S3/B2)	restic, rclone

Key implementation details:

Use different tools for each backup copy. If a bug in one tool corrupts or silently skips files, the other tool will still produce a valid backup. For example, use borgbackup for local NAS backups and restic for offsite cloud copies. Tool diversity eliminates single points of failure in backup software itself
Automate backups with cron or systemd timers, never rely on manual execution
Encrypt all offsite backups at rest (restic and borgbackup do this by default)
Monitor backup jobs and alert on failures
Retain multiple versions (daily for 7 days, weekly for 4 weeks, monthly for 12 months is a common starting point)
Test restores regularly, at minimum once per quarter

Summary

The 3-2-1 rule provides a simple framework that scales from personal workstations to enterprise infrastructure. The exact tools and storage targets will vary, but the underlying principle stays the same: diversify copies, diversify media, and keep data offsite. When in doubt, add more redundancy rather than less. Data that exists in only one place effectively does not exist.

Real-World Data Loss Incidents

These incidents demonstrate what happens when backup strategies fail or are absent entirely.

Toy Story 2 near-deletion (1998)

What happened: Someone ran rm -rf * on the Pixar file server hosting the Toy Story 2 production files. The backup system had been failing silently for months without anyone noticing. The film was saved only because a technical director had a copy on her home workstation that she used while working remotely during maternity leave.

Lessons: Backups that are never tested are not backups. Silent failures can persist for months if monitoring and restore tests are absent. An accidental offsite copy saved an entire film production.

Gmail account reset (2011)

What happened: A storage software update caused approximately 150,000 Gmail accounts to be wiped clean: all emails, contacts, labels, and chat history disappeared. Google restored the affected accounts from tape backups within a few days.

Lessons: Even at Google’s scale with massive online redundancy, tape-based offsite backups remain the last line of defense. Software bugs can bypass all online replication layers simultaneously. A different media type (tape) on a separate system saved the data.

GitLab database deletion (2017)

What happened: An engineer accidentally deleted a production PostgreSQL database during maintenance. Of five backup mechanisms in place, none worked correctly: LVM snapshots were never configured, regular database dumps had errors, S3 backups had never been tested, and Azure disk snapshots were not enabled. The recovery relied on a 6-hour-old copy that an engineer happened to have made. GitLab lost approximately 6 hours of production data.

Lessons: Multiple backup systems mean nothing if none of them are verified. Every backup mechanism should be tested regularly with actual restore operations. Redundancy on paper is not redundancy in practice.

VFEmail destruction (2019)

What happened: Attackers formatted every disk on every server of the U.S.-based email provider, destroying almost two decades of data including all backups. No ransom was demanded. The primary and backup systems shared the same access layer, so compromising one gave access to everything.

Lessons: Backups must be isolated from production systems. Shared authentication between primary and backup infrastructure means a single compromise destroys both. Air-gapped or immutable backups would have survived this attack.

Myspace data loss (2019)

What happened: Myspace announced that all photos, videos, and audio files uploaded before 2016 were lost during a server migration. Over 50 million songs from 14 million artists were permanently destroyed.

Lessons: Migrations are high-risk operations that require verified backup copies before execution. Irreplaceable user-generated content demands the same protection level as any critical data.

Garmin WastedLocker ransomware (2020)

What happened: The Evil Corp group encrypted Garmin’s internal systems, taking down Garmin Connect, flyGarmin, and all customer-facing services for nearly five days. Call centers, email, and online chat were all unavailable. Garmin reportedly paid a $10 million ransom to obtain the decryption key.

Lessons: Network-accessible backups are vulnerable to the same ransomware that hits production systems. Immutable or air-gapped backups would have eliminated the need to pay a ransom. The $10 million payment far exceeds the cost of proper backup infrastructure.

OVHcloud Strasbourg fire (2021)

What happened: A fire destroyed the SBG2 data center and damaged SBG1 in Strasbourg. Customers who stored both primary data and backups in the same facility lost everything permanently. Those with offsite backups or multi-region replication recovered.

Lessons: “Offsite” must mean genuinely separate geography, not just another rack or building on the same campus. A cloud provider’s local backup option does not satisfy the offsite requirement of the 3-2-1 rule.

Kyoto University supercomputer data loss (2021)

What happened: A faulty HPE software update meant to clean up old log files instead deleted 77 TB of research data (34 million files) from the supercomputer’s backup storage between December 14 and 16, 2021. The bug affected 14 research groups, and data belonging to 4 of them was irrecoverable.

Lessons: A backup stored on the same system it protects is not a separate copy. Automated maintenance scripts must be tested in isolation before running against production backup storage. Critical research data needs copies on independent systems.

Atlassian cloud data deletion (2022)

What happened: A deployment script intended to decommission one app used incorrect identifiers and deleted data for 883 sites belonging to 775 customers. Jira, Confluence, Opsgenie, and Statuspage were unavailable for affected customers for up to 14 days.

Lessons: Restore procedures must be tested at scale, not just for individual accounts. Atlassian had no automated system to restore a large subset of customers simultaneously, turning a minutes-long deletion into a two-week recovery. Disaster recovery plans must account for mass-restore scenarios.

CloudNordic ransomware (2023)

What happened: Ransomware encrypted all servers of Danish cloud provider CloudNordic, including primary and secondary backup systems. The attack happened during a data center migration when already-infected machines were connected to the internal network. The majority of customers - hundreds of Danish companies - lost all their data permanently.

Lessons: Migrations create windows of vulnerability when systems from different security zones are connected. Both backup tiers were reachable from the same network, so one compromise encrypted everything. At least one backup copy must be offline or on an isolated network segment.

UniSuper Google Cloud deletion (2024)

What happened: A misconfiguration during provisioning caused Google Cloud to delete the entire private cloud subscription of UniSuper, a $125 billion Australian pension fund. The deletion propagated across both geographic regions meant to protect against exactly this scenario.

Lessons: Multi-region replication within a single provider does not protect against provider-level errors. UniSuper recovered because they maintained backups with an independent third-party provider outside of Google Cloud. True 3-2-1 compliance requires at least one copy outside your primary provider.

South Korea national data center fire (2025)

What happened: A fire caused by thermal runaway of an expired UPS battery module at the National Information Resources Service data center in Daejeon burned for nearly 22 hours, taking down 647 government IT systems - roughly one-third of the nation’s online public services. 858 TB of government data was lost, including G-Drive, a cloud system used by 125,000 officials, which had no backups because the volume was deemed too large.

Lessons: No dataset is “too large” to back up - the cost of losing it is always higher. Physical infrastructure failures can destroy an entire site regardless of digital redundancy. Critical government services must have offsite copies on separate media, with no exceptions.