CDP:Web Storage Archive Pattern
Archiving Large Volumes of Data
Contents |
Problem to Be Solved
Logs and backup files that are produced in large volumes in the individual servers must be stored for some period of time. Provision of a high-volume disk for that purpose is inefficient in terms of costs. In particular, in a system that is growing, it is difficult to estimate the sizes of the files that will be stored (in capacity planning). You can archive the logs from the individual files and rotate them over short intervals to eliminate the maintenance work required in enlarging the disks of the individual servers. However, there will still be the same issue of capacity planning for the shared storage.
Explanation of the Cloud Solution/Pattern
AWS provides Internet storage, which has essentially limitless capacity, and you can use this Internet storage as a place to archive logs. This eliminates worries about disk enlargement maintenance or capacity planning in advance, making it easy to archive logs in shared storage. (You still need to perform capacity planning in relation to costs, however.)
Implementation
The Amazon Simple Storage Service (S3) provides high reliability and durability, and is well-suited as a location for storing logs. You use a tool (such as s3cmd or s3sync) to upload to S3 easily.
- Use rotation software (such as logrotate) to rotate the various types of logs outputted by the EC2 instances. Store the logs to S3 with a specific rotation timing. (Write a procedure for uploading to S3 in the rotation script.)
Configuration
Benefits
- You can store the logs to S3 to eliminate the need to worry about disk space, enabling the logs to be archived without the risk of loss due to a failure.
- You can use the same system for saving backup files as well.
- While in Elastic Block Store (EBS) charges are based on size, in S3 charges are based on the amount of use.This enables more reasonable operations.
Cautions
- If there is a failure in the EBS prior to log rotation, the log from the previous rotation will be lost.
- When Auto Scaling is used, you will need to store the log in S3 prior to shutting down the EC2 instance.