Auto Scaling and Reliability

Effective Auto Scaling is a critical technique used to keep your environment up and running. FP Complete has the experience to build a solution that effectively monitors and dynamically allocates compute resources to maintain performance for applications hosted in your multi-cloud environment.

Does this eliminate the possibility of downtime? No. But with a combination of best class DevOps practices and a reliable solution running to stabilize your environment. Effective auto scaling protects against machine and network failure.

Let’s say you have a web service. On a typical day, it receives about 10 requests per second. Your application, and the hardware it’s deployed on, can serve 50 requests per second easily. How many machines do you need?

Now, let’s change it up a bit. You expect to get monthly bursts of traffic where, for a six hour period, you’ll receive 400 requests per second. How many machines do you need?

If your answer is one machine, or eight machines, you should read about auto scaling.

Let’s take the first case. There are no bursts in requests. You know with 100% certainty that you will never receive more than 10 requests per second, and also with 100% certainty that your app/hardware combo can serve 50 requests per second. Is one machine sufficient? If you want high availability, the answer is no!

Machine failure is not a statistical anamoly. It is a real fact of life that must be planned for. Hardware can break, a cloud provider may have a configuration glitch, or something else.

If you have a service where downtime is acceptable, one machine is sufficient. If you have a good monitoring and alerting system, you’ll get a 2am SMS, roll out of bed, and spin up a new machine.

But if that 2am SMS doesn’t sound pleasant, or you don’t want to have that downtime, you need at least two machines. On a typical cloud provider, you’ll want to ensure that the two machines are living in two different availability zones, so that if one zone experiences an outage, you’ll still have a server running in another zone.

Does this eliminate the possibility of downtime? No. But you’ve drastically reduced the likelihood of it happening, since with two machines you now have to have two statistically unlikely events occur simultaneously. And if you bump this up to three, you really are now talking about a statistical anamoly.

One of the big innovations of cloud computing is on-demand resources. Understanding how to plan for bursty workloads is critical to keeping applications up and keeping costs down.

A great example is being able get the resources you need only when you need them. Why pay for extra machines for a month when your load only increases for six hours a month. Renting eight extra machines for a month is $160. While renting eight machines for 6 hours and only two for the rest of the month only cost $41.

While the $119 does not seem like a lot, in a large project this savings adds up quickly.

In the same way, machines can be removed if the reverse is true, or if a machine in the cluster fails some kind of aliveness check. This solves both the bursty workload issue and provides for auto-recovery from machine failure. Our standard deployment workflows, based on tools like Terraform and Kubernetes creates a cluster of auto-recovering, auto-scaling machines that are able to host multiple applications.

Effective autoscaling will automatically scale up or down a service’s capacity based on changes in demand. This can come from adding or removing actual machines, or with an orchestration system based on better utilizing existing machines.

Auto scaling is the technique of automatically adding and removing machines from your cluster based on some conditions. Some typical conditions for increasing capacity would be:

  • Average CPU load on all machines in the cluster has been above 80% for over 5 minutes
  • Average HTTP response time has degraded below a certain threshold for more than 5% of requests

By contrast, machines can be removed if the reverse is true, or if a machine in the cluster fails some kind of aliveness check. This solves both the bursty workload issue, and provides for auto-recovery from machine failure.

Our standard deployment workflows, based on tools like Terraform and Kubernetes creates a cluster of auto-recovering, auto-scaling machines that are able to host multiple applications.

Let’s take the first case. There are no bursts in requests. You know with 100% certainty that you will never receive more than 10 requests per second, and also with 100% certainty that your app/hardware combo can serve 50 requests per second. Is one machine sufficient? If you want high availability, the answer is no!

Machine failure is not a statistical anamoly. It is a real fact of life that must be planned for. Hardware can break, a cloud provider may have a configuration glitch, or something else.

If you have a service where downtime is acceptable, one machine is sufficient. If you have a good monitoring and alerting system, you’ll get a 2am SMS, roll out of bed, and spin up a new machine.

But if that 2am SMS doesn’t sound pleasant, or you don’t want to have that downtime, you need at least two machines. On a typical cloud provider, you’ll want to ensure that the two machines are living in two different availability zones, so that if one zone experiences an outage, you’ll still have a server running in another zone.

Does this eliminate the possibility of downtime? No. But you’ve drastically reduced the likelihood of it happening, since with two machines you now have to have two statistically unlikely events occur simultaneously. And if you bump this up to three, you really are now talking about a statistical anamoly.

One of the big innovations of cloud computing is on-demand resources. Understanding how to plan for bursty workloads is critical to keeping applications up and keeping costs down.

A great example is being able get the resources you need only when you need them. Why pay for extra machines for a month when your load only increases for six hours a month. Renting eight extra machines for a month is $160. While renting eight machines for 6 hours and only two for the rest of the month only cost $41.

While the $119 does not seem like a lot, in a large project this savings adds up quickly.

In the same way, machines can be removed if the reverse is true, or if a machine in the cluster fails some kind of aliveness check. This solves both the bursty workload issue and provides for auto-recovery from machine failure. Our standard deployment workflows, based on tools like Terraform and Kubernetes creates a cluster of auto-recovering, auto-scaling machines that are able to host multiple applications.

Effective autoscaling will automatically scale up or down a service’s capacity based on changes in demand. This can come from adding or removing actual machines, or with an orchestration system based on better utilizing existing machines.

Auto scaling is the technique of automatically adding and removing machines from your cluster based on some conditions. Some typical conditions for increasing capacity would be:

  • Average CPU load on all machines in the cluster has been above 80% for over 5 minutes
  • Average HTTP response time has degraded below a certain threshold for more than 5% of requests

By contrast, machines can be removed if the reverse is true, or if a machine in the cluster fails some kind of aliveness check. This solves both the bursty workload issue, and provides for auto-recovery from machine failure.

Our standard deployment workflows, based on tools like Terraform and Kubernetes creates a cluster of auto-recovering, auto-scaling machines that are able to host multiple applications.