GCP 101: How Autoscaling Works in Google Cloud

Posted by Doug Sainato, Enterprise Cloud Account Executive

Jun 18, 2020


Does increased traffic in your pay-as-you-go infrastructure make you nervous? It shouldn’t, particularly if you’re operating in a Google Cloud Platform environment. You have the power to manage traffic going to your virtual machines (VMs) and Kubernetes clusters.

Are you looking at this blog and wondering, “What exactly does this all mean?” Never fear. That’s what this ongoing GCP 101 series is all about. We’re providing a solid introduction to Google Cloud Platform and all of the features that you can use to architect a cloud environment that works for your organization.

We recently covered how to use Google Compute Engine to run apps on your VMs on physical servers. Now we’re going to take a look at a related feature: autoscaling and how it helps you make the most of your traffic patterns and cloud budget.

What is Autoscaling in Google Cloud?

Autoscaling is a tool that allows your apps to efficiently handle increases in traffic by dynamically adding compute capacity but also reduce capacity and costs in periods of low traffic and resource demand. 

autoscaling graphGoogle Compute Engine uses managed instance groups (MIGs), or a collection of common VM instances created from the same API resource known as a template, to automatically add or remove instances based on traffic and demand to your application. MIGs are multiple, identical VMs that deliver reliable availability and performance for the same application. They’re managed as a single entity, which is a perfect scenario for using autoscaling.

Similarly, Google Kubernetes Engine (GKE), a managed Kubernetes service, and Cloud Run, a fully managed compute platform for deploying containerized applications, both support autoscaling to automatically resize the number of nodes or container instances based on the demands of your workloads. See our ongoing Kubernetes 101 series for more details. 

Speaking of load balancing, it’s also an important part of the equation. It handles the routing and distribution of traffic to the closest VMs or the VMs in the MIG with the most available capacity. 

Load balancing also helps with detecting and removing unhealthy VMs in the MIG using health checks, as well as adding instances that become available/healthy again.

How Does Autoscaling in Google Cloud Work?

To launch autoscaling at the most basic level, you need to determine your CPU target resource utilization level. This is the level where you want to maintain your VM and container instances. 

autoscaling with scaleAsk yourself what percentage would you like to reach, whether it’s 50%, 75%, or another level.

Once you determine this, you then will need to create an autoscaling policy that is centered around your target utilization level. 

Autoscaling will then continuously monitor your MIG and collect usage information based on the policy you created. 

It will compare actual utilization with your target to determine which, if any, groups need to be scaled up or down to maintain the CPU utilization level as close to your specified level as possible. It does this by adding or removing VM instances from the MIG to keep your desired utilization level.

You also can set up autoscaling to balance serving capacity loads that can be based on utilization or requests per second. You define the serving capacity of an instance through a backend service, a set of values used to connect backends and various distribution and session settings. We will get into more depth on this in a future blog.

What are the Benefits of Autoscaling in Google Cloud?

Being able to add or remove VMs based on resource demand and traffic allows you to build a resilient, cost-effective Google Cloud infrastructure that uses just the right amount of resources at the right time for your application’s workload.

This intelligent, dynamic scaling tool helps you keep your actual spend compared to budget in check, reducing expenses and eliminating pay-as-you-go surprises, even when unexpected spikes occur. You can be sure you have the right number of Google Compute Engine instances available at any given time to handle an app’s workload.


7 Domains Whitepaper

Subscribe for Updates

Doug Sainato, Enterprise Cloud Account Executive

Across his 20+-year tech career, Doug Sanaito has helped organizations get the most out of the cloud. He has served as a business analyst, sales/solution engineer and sales account executive, roles that reflect his lifelong love of analytical problem-solving. It comes in handy more often than not in the tech world, as he can attest. When he joined Onix six years ago, he started as a Google Apps SESolution Engineer, a role that helped him quickly develop a passion for the cloud infrastructure and all of the possibilities it offers to organizations launching a cloud journey. He’s an original member of Onix’s GCP team and has held sales, consulting and leadership roles. When his head is out of the cloud, Doug enjoys listening to the Beatles, visiting the beach and finally hoping to catch a big fish.

Popular posts

AWS 101: What is Amazon S3 and Why Should I Use It?

Kubernetes 101: What are Nodes and Clusters?

Google Workspace vs. Microsoft 365: A Comparison Guide (2022)