A kubernetes-aware cloudpool proxy that offers graceful scale-down functionality
The kubeaware-cloudpool-proxy
is a proxy that is placed between a
cloudpool and its clients (for example, an autoscaler).
In essence, the kubeaware-cloudpool-proxy
adds Kubernetes-awareness to an existing cloudpool
implementation. The Kubernetes-awareness allows worker node scale-downs to be handled with less
disruption by taking the current Kubernetes cluster state into account, carefully selecting a node,
and evacuating its pods prior to terminating the cloud machine instead of just brutally killing a “random” worker node (at least appearing “random” from the Kubernetes-perspective).
The kubeaware-cloudpool-proxy delegates all cloud-specific actions to its backend cloudpool.
In fact, most REST API operations are directly forwarded to the backend cloudpool as-is. There
are two notable exceptions, that require the proxy to take action, both of which could lead to
a scale-down:
desiredSize
lower than the current pool size),When a node needs to be removed, the kubeaware-cloudpool-proxy
communicates with
the Kubernetes API server to determine the current cluster state. These interactions
are illustrated in the image below.
When asked to scale down, the kubeaware-cloudpool-proxy
takes care of taking down
nodes in a controlled manner by:
Carefully determining which (if any) nodes are candidates for being removed.
A node qualifies as a scale-down candidate if it satisfies all of the
following conditions:
cluster-autoscaler.kubernetes.io/scale-down-disabled
annotation.kube-system
namedkube-apiserver-<host>
or having a component
label with value kube-apiserver
)Ready
and Schedulable
Selecting the “best” victim node to kill (if at least one candidate was found
in the prior step). In this context, the “best” node is typically the least
loaded node — the node with the least amount of pods that need to be evacuated
to another node.
build.sh
builds the binary and runs all tests (build.sh --help
for build options).
The built binary is placed under bin
. The main binary is kubeaware-cloudpool-proxy
.
Test coverage output is placed under build/coverage/
and can be viewed as HTML
via:
go tool cover -html build/coverage/<package>.out
The kubeaware-cloudpool-proxy
requires a JSON-formatted configuration file.
It has the following structure:
{
"server": {
"timeout": "60s"
},
"apiServer": {
"url": "https://<host>:<port>",
"auth": {
... authentication mechanism ...
},
"timeout": "10s",
},
"backend": {
"url": "http://<host>:<port>",
"timeout": "300s",
}
}
The authentication part can be specified either with a concrete client
certicate/key pair and a CA cert or via a
kubeconfig file.
With a kubeconfig file, the auth
is specified as follows:
...
"apiServer": {
"url": "https://<host>:<port>",
"auth": {
"kubeConfigPath": "/home/me/.kube/config"
}
},
...
With a specific client cert/key the auth
configuration looks as follows:
...
"apiServer": {
"url": "https://<host>:<port>",
"auth": {
"clientCertPath": "/path/to/admin.pem",
"clientKeyPath": "/path/to/admin-key.pem",
"caCertPath": "/path/to/ca.pem",
}
},
...
The fields carry the following semantics:
server
: proxy server settingstimeout
: read timeout on client requests. Default: 60s
apiServer
: settings for the Kubernets API serverurl
: URL is the base address used to contact the API server. For example, https://master:6443
.auth
: client authentication credentialskubeConfigPath
: a file system path to a kubeconfig file, the type ofkubectl
. When specified, any otherurl
.clientCertPath
: a file system path to a pem-encoded API serverkubeConfigPath
is specified.clientKeyPath
: a file system path to a pem-encoded API serverkubeConfigPath
is specified.caCertPath
: a file system path to a pem-encoded CA cert for the APIkubeConfigPath
is specified.timeout
: request timeout used when communicating with the API server. Default: 60s
.backend
: settings for communicating with the backend cloudpool that the proxy sits in front of.url
: the base URL where thehttp://cloudpool:9010
.timeout
: the connection timeout to use when contacting the backend. Default: 300s
.After building, run the proxy via:
./bin/kubeaware-cloudpool-proxy --config-file=<path>
To enable a different glog log level use something like:
./bin/kubeaware-cloudpool-proxy --config-file=<path> --v=4
To build a docker image, run
./build.sh --docker
To run the docker image, run something similar to:
docker run --rm -p 8080:8080 \
-v <config-dir>:/etc/elastisys \
-v <kubessl-dir>:/etc/kubessl \
elastisys/kubeaware-cloudpool-proxy:1.0.0 \
--config-file=/etc/elastisys/config.json --port 8080
In this example, <config-dir>
is a host directory that contains a config.json
file
for the kubeaware-cloudpool-proxy
. Furthermore, <kubessl-dir>
must contain the
pem-encoded certificate/key/CA files required to talk to the Kubernetes API server.
These cert files are referenced from the config.json
which, in this case, could look
something like:
{
"apiServer": {
"url": "https://<hostname>",
"auth": {
"clientCertPath": "/etc/kubessl/admin.pem",
"clientKeyPath": "/etc/kubessl/admin-key.pem",
"caCertPath": "/etc/kubessl/ca.pem"
}
},
"backend": {
"url": "http://<hostname>:9010",
"timeout": "10s"
}
}
dep is used for dependency management.
Make sure it is installed.
To introduce a new dependency, add it to Gopkg.toml
, edit some piece of
code to import a package from the dependency, and then run:
dep ensure
to get the right version into the vendor
folder.
The regular go test
command can be used for testing.
To test a certain package, and to see logs (for a certain glog v-level), run something like:
go test -v ./pkg/kube -args -v=4 -logtostderr=true
For some tests, mock clients are used to fake interactions with “backend services”.
More specifically, these interfaces are KubeClient
, CloudPoolClient
, andNodeScaler
. Should any of these interfaces change, the mocks
need to be recreated (before editing the test code to modify expectations, etc).
This can be achieved via the mockery tool.
go get github.com/vektra/mockery/...
Generating the mocks
mockery -dir pkg/kube/ -name KubeClient -output pkg/kube/mocks
mockery -dir pkg/kube/ -name NodeScaler -output pkg/proxy/mocks
mockery -dir pkg/cloudpool/ -name CloudPoolClient -output pkg/proxy/mocks
The generated mocks should end up under `pkg/mocks/`
In some cases, we would like to see more rapid utilization of newly introduced worker nodes,
to make sure that it immediately starts accepting a share of the workload. Typically,
what we’ve seen so far, is that a new node gets started, but once it is up it is typically
very lightly loaded (if at all). It would be nice to see some pods being pushed over to the
node. Furthermore, it would be useful to make sure that all required docker images are
pulled to new nodes as early as possible to avoid unnecssary delays later when pods are
scheduled onto the node.