CDP:Deep Health Check Pattern
System Health Check
Problem to Be Solved
You can use the health check function in the Load Balancer to evaluate the status of the servers bound to the Load Balancer when distributing processes.
In a configuration with a web server, a proxy server, an AP server, and a DB server, let's think about a case with a Load Balancer prior to the web server. The Load Balancer is able to evaluate the status of the web server, and cut off the web server if it is malfunctioning. However, the Load Balancer is unable to discern the status of the back-end servers, such as the proxy server, the AP server, and the DB server.
Explanation of the Cloud Solution/Pattern
You can use the health check function in the Cloud Load Balancer to set up a dynamic page in PHP, JavaServlet, or the like (that is, a program) to perform checks. The program is able to check the operations of the proxy servers, AP servers, and DB servers, and the like, to return the results to the Load Balancer. This makes it possible to check the health of the system as a whole.
The health check function of the ELB Load Balancer service in AWS performs a status check in terms of whether or not HTTP(S) access to a specified URL is possible. Using this, set the destination for the health check to a dynamic page. We will give an example of implementation for a system structured from a web server, a proxy server, an AP server, and a DB server.
- Start the ELB and enable the health check function.
- Create the program to run on the AP server. Have that program involve accessing the database.
- Set the URL for the health check in the ELB to the program, and have the request to that URL activate the program.
- Execute the health check from the ELB.
- This makes it possible for you to check all of the servers required for system operation.
- Depending on how the program that responds to the health check is made, it may perform a Close Process (that is, not accept the request), or return customized error information, depending on the detail of the failure.
- If there is a large number of servers, then the health checks themselves will contribute to the traffic, so you must carefully consider the timing for the health checks.
- If the DB server has become a single point of failure (SPOF) and has gone down, there may be an overreaction that can take all of the servers down, depending on how the back-end server check program is written.
- The DB Replication Pattern should be used in parallel so that the DB server part does not become a SPOF.