CDP:Snapshot Pattern
Data Backups
Contents |
Problem to Be Solved
More than anything else, it is important that your data is safe. This means that it is important for you to back up your data. For example, you may use a tape to backup data to a separate location. However, with a tape backup, you have the cost of changing and storing tapes, for example, and these are operations that are difficult to automate. While you can buy expensive equipment to provide some degree of automation, even when this is done, you are still faced with the fact that tapes is inherently have limited capacity, and complete automation is difficult.
Explanation of the Cloud Solution/Pattern
The AWS Cloud lets you use the limitless capacity of "Internet storage" (also known as "Web Storage"), doing so safely and with relatively little expense.
In the AWS Cloud, we often talk about a "snapshot," which is a backup copy of your data at a given point in time. The AWS Cloud lets you copy the data of a virtual server (including the operating system), along with other data, to Internet storage easily, making it easy for you to take these snapshots at regular intervals. You can take a snapshot with one click of a button in the Control Screen, or you can take a snapshot using an API. That is, you can automate it using a program. Because with Internet Storage you don't have to worry about capacity, you can automate the backup process by having a computer program take snapshots on a regular periodic basis.
When performing program update checks or creating temporary test environments, you need to create the environment using a specific data cross-section. In this case, you need to copy not just the data, but the OS as well. A snapshot is a perfect solution for this as well, because it copies each individual OS.
Implementation
The Elastic Block Store (EBS), which is the virtual storage in AWS, has a snapshot function. Make a snapshot using this function. When you take a snapshot, it is stored in the Amazon Simple Storage Service (S3) object storage, which is designed to have 99.999999999% durability.
When the EBS snapshot function is used, all of the data required to recreate the EBS volume is copied to S3. A snapshot that has been stored in S3 can be recovered as a new EBS volume. Even if data is lost or corrupted in the EBS, you can recover the data from the time when the snapshot was taken.
When you use EBS as a data disk, you can backup your data at any time by taking a snapshot. You can take as many new backups as necessary, whenever necessary, without having to worry about storage capacity.
When you use EBS as a boot disk, you can make a copy for each operating system, and store it as an Amazon Machine Image (AMI). You can launch a new EC2 instance from that data.
Configuration
Benefits
- Taking backups can be controlled by a computer program. That is, you can automate the process, rather than having to make backups manually.
- You can use S3, which has high durability, as the backup destination.
- You can backup up all of the data on the EBS volume as a snapshot enabling immediate use of the snapshot to create a new EBS volume. This makes recovery easy in the event of a failure.
- You can make backups not just of user data, but of each individual operating system as well. Because the backup for each OS is stored as an AMI, you can launch new EC2 instances.
- You can take data under specific circumstances (for example, after replacing an application or after updating data), and can take multiple generations, without having to worry about storage capacity. This lets you rebuild the environment easily after a failure or a problem, enabling you to return to the environment of any given point in time.
Cautions
- You must maintain data consistency when taking snapshots. When you take a snapshot with the EBS volume mounted, make sure to take the snapshot in a state where logical consistency has been achieved, for example, after flushing the cache of the file system (EXT or NTFS), and after application transactions have been completed.
- Typically, the smaller the data size of the boot disk, the more rapidly the virtual server can be started. Note that disk checks that are performed periodically (fsck in Linux) also take time.
Other
- You may want to split the boot partition and the data partition into separate EBS volumes when making a backup, because you will probably want to backup your data parts more often than your boot parts.