Imagine you have built a website that solves mathematical equations. You receive an equation as input and calculate the solutions on your server and return them back. And your server runs on a machine that has 1GB RAM and a 2-cores CPU. And you get 100 requests per hour which your server is able to respond to within an acceptable time.
But suddenly your website gets popular in the academic circles and starts receiving 1000s of requests per hour. You notice a heavy slowdown of responses on your website and even some failed requests.
To fix this situation, you will need to scale your servers. There are two approaches to scaling your servers: Horizontal and Vertical
Vertical Scaling (or scaling up) means switching to a better machine or upgrading your current machine. That means you switch to a machine that has more resources like more RAM, more CPU Cores, more storage, and better networking hardware. So, upgrading your existing set of machines refers to vertical scaling.
But you can’t infinitely vertical scale your servers. With much more bulky machines, the chance of them getting shut down is more and there is also a physical limit on how large a single machine can be. Also, if this machine goes down, people will not be able to visit your website until the machine and your server boot up.
To fix this, we have the horizontal scaling (or scaling out) approach. We create a pool of machines with the same configuration and then distribute incoming requests to them. Even if one of the machines goes down, the load can be distributed over to the remaining machines. This takes care of the single point of failure issue.
In practice, we use both horizontal and vertical scaling. Tools like Kubernetes, AWS ASG, and AWS LB allow us to manage horizontal and vertical scaling with ease.