A service that autoscales Bigtable clusters based on CPU load
This repo is no longer actively maintained. While it should continue to work and there are no major known bugs, we will not be improving bigtable autoscaler or releasing new versions.
If you have a Bigtable cluster and you would like to optimize its cost-efficiency by using the
right number of nodes at any given time you should consider using this Bigtable
autoscaler service! The Bigtable autoscaler lets you do that
with no manual intervention.
Run this command to build the project and create a docker image:
mvn package
First review and edit .env with your Google cloud credentials.
Start the service with docker-compose using a dockerized local postgres:
# source your environment
. ./.env
# start the service with docker compose
make up
# see service logs
make logs
Register the Bigtable cluster that should be autoscaled in the service:
PROJECT_ID=<YOUR GCP PROJECT ID>
INSTANCE_ID=<YOUR INSTANCE ID>
CLUSTER_ID=<YOUR CLUSTER ID>
curl -v -X POST "http://localhost:8080/clusters?projectId=$PROJECT_ID&instanceId=$INSTANCE_ID&clusterId=$CLUSTER_ID&minNodes=4&maxNodes=6&cpuTarget=0.8"
If the cluster was at 3 nodes, this will immediately rescale the cluster to 4 nodes as that’s the
minimum threshold. If you generate some significant load to the cluster, it may scale up to 6 nodes.
Stop docker-compose:
make down
If you want to run this in production, consider using a Cloud SQL postgres database to store the
state. We recommend connecting using the JDBC socket factory.
Just update .env with your postgres url, user and password and then run:
# source your environment
. ./.env
# start the service with docker compose
make run
This runs the same bigtable-autoscaler image, doesn’t run postgres, and points bigtable-autoscaler to the postgresql you provided.
In the same way you can see service logs (make logs) and then to stop the service:
make stop
You can register any additional JAX-RS resource, JAX-RS or Jersey contract provider or JAX-RS feature by editing the
config file.
You can either
additionalPackages
for any resource to be discovered. For this to work, resources to be discovered should be annotated.additionalClasses
(semicolon separated).The Bigtable autoscaler is a backend service that periodically sends
resize commands to Bigtable clusters. It is backed by a PostgreSQL database for
keeping its state, like for example:
The autoscaler checks the database every 30 seconds and decides if it should
do something or not (there are time thresholds to not resize clusters too often).
In case it’s time to check a cluster, it fetches the current CPU utilization
from the Bigtable API. If that is different from the target CPU utilization
(also here there are thresholds) it calculates the adequate number of nodes
and then it sends a resize request.
The autoscaler also provides an HTTP API to insert, update and delete Bigtable
clusters from being autoscaled.
Beta: We are using Bigtable Autoscaler in production clusters at Spotify, and we are actively developing it.
Not on its own. In order to not overwhelm Bigtable, you can PUT to the /clusers/override-min-nodes/
endpoint, passing it a number that basically overrides the min nodes count that the autoscaler must immediately respect. The official Google documentation states that if you are doing big batch jobs, you should rescale in advance and wait up to 20 minutes before starting the actual job.
Additionally, when you decrease the number of nodes in a cluster to scale down after the job is complete, try not to reduce the cluster size by more than 10% in a 10-minute period. Scaling down too quickly can cause performance problems, such as increased latency, if the remaining nodes in the cluster become temporarily overwhelmed.
We realize that this can be inconvenient and welcome any ideas on how to approach this problem better.
Yes.
Since July 1st 2018 Google enforces storage limits on Bigtable nodes. In particular each Bigtable node will be able to handle at most 8Tb on HDD clusters and 2.5Tb on SSD clusters (for more info take a look here). Writes will fail until these conditions are not satisfied. The autoscaler will make sure that these constraints are respected and prefer those to the CPU target in that situation.
No!
A resize command may fail if you don’t have enough quota in the GCP project. This will be logged
as an error.
Yes!
We increased the project’s modularity, so you can create your custom strategy in your project,
which uses the Bigtable Autoscaler as a dependency, and implement the class “Algorithm”.
If you add the class path of your new custom strategy in the column extra_enabled_algorithms
, it
will be considered for upscaling the cluster.
Note that the recommended number of nodes will be the higher between the strategies in this
project (CPU + Storage constraints), and your custom strategies.
See the API doc
This project adheres to the
Open Code of Conduct.
By participating, you are expected to honor this code.