CDP:Write Proxy Pattern
High-Speed Uploading to Internet Storage
Problem to Be Solved
Typically Internet storage has extremely high capacity in regards to reading, along with extremely high data durability. However, to maintain redundancy, not only is data written to multiple locations, but also communication with the client is through the HTTP protocol; as a result, the writing speed is relatively slow. Because of this, there is a performance problem when writing large amounts of data to Internet Storage.
Explanation of the Cloud Solution/Pattern
Rather than passing data directly to Internet storage from the client, configure so that the data is received by a virtual server, and then is forwarded to Internet storage from the virtual server. The transfer from the client to the virtual server can use a protocol that is faster than HTTP (for example, a protocol based on UDP). when there is a large number of small files, they may first be archived on the client side and then decompressed after transfer to the virtual server, followed by forwarding to Internet storage. If the virtual server and the Internet storage are in the same region, then the connection will be through a dedicated line, which can reduce the total transfer time through the virtual server substantially when compared to that of transferring to Internet storage directly.
- Start up an EC2 instance for receiving the data. Start up the EC2 instance in the same region as the Amazon Simple Storage Service (S3) that is the ultimate data storage destination.
- Install, on the EC2 instance, an FTP server or web server, UDP transfer software such as Aspera or TsunamiUDP, or software able to accelerate the transfer speed. (This server is known as the "upload server.")
- Transfer the data from the client to the upload server. If there is a large number of small files, first combine them into a single file on the client.
- After completion of the transfer to the upload server (or as a sequential operation), transfer from the upload server to S3. If archived on the client, transfer to S3 after decompression on the upload server.
- This can increase the speed of transfer to S3.
- In particular, you can expect a dramatic increase in the transfer speed when uploading to a S3 in a region in a foreign country.
- Because in some cases the speed of writing to the EC2 instance that is the upload server (which is typically the speed of writing to the EBS) may become the bottleneck, you may need to perform disk striping (see Ondemand Disk Pattern), to increase the writing performance.
- Because the pipeline for a small EC2 instance is relatively narrow, use a large instance if high performance is required.
- Solutions for increasing the speed of data transfer using UDP include TsunamiUDP, Aspera, SkeedSilverBullet, and so forth.
- You can use a technique where the file is divided into parts and written in parallel (known as the "multi-part approach") as a way to increase the performance of writing to S3.
- This enables you to not just increase speed, but increase convenience for the user as well through, for example, uploading to an EC2 instance through FTP and then automatically synchronizing to S3 as-is.